# Python : Core : Files & Dirs
[Built-in Functions](https://docs.python.org/3/library/functions.html)
**Standard Library**
- [fcntl](https://docs.python.org/3/library/fcntl.html) — interface to `fcntl()` and `ioctl()` Unix functions (TODO)
- [fileinput](https://docs.python.org/3/library/fileinput.html) — *read all lines of all input files (Perl-style)*
- [fnmatch](https://docs.python.org/3/library/fnmatch.html) — *syntax and low-level functions for shell-style path wildcards*
- [glob](https://docs.python.org/3/library/glob.html) — *OBSOLETE; use `pathlib.Path.glob()`*
- [io](https://docs.python.org/3/library/io.html) — *file classes*
- [os](https://docs.python.org/3/library/os.html) — *low-level file I/O; partly replaced by pathlib*
- [os.path](https://docs.python.org/3/library/os.path.html) — *mostly obsolete by pathlib*
- [pathlib](https://docs.python.org/3/library/pathlib.html) — *`Path` classes*
- [shutil](https://docs.python.org/3/library/shutil.html) — *high-level file handling, like working with trees*
- [tempfile](https://docs.python.org/3/library/tempfile.html)
## Overview
Three categories or modes: **(1)** raw binary **(2)** buffered binary **(3)** text
**File Descriptor**
An integer identifier that represents open files in the OS kernel. Usually, `stdin`/`stdout`/`stderr` are 0/1/2.
**File Object (aka File-like Objects or Streams)**
An object exposing a file-oriented API (like read() or write()) to an underlying resource. ([glossary](https://docs.python.org/3/glossary.html#term-file-object))
#### Preview
```python
# managing the file object manually
f = open( "infile" )
data = f.read()
f.close()
# using as context manager
with open( "outfile", "w", encoding="utf-8" ) as f:
f.write( data )
# reading input files with fileinput module
for line in fileinput.input():
process( line )
# opening file descriptors (fds) and converting between fds and file objects
f.fileno() # get fd from file object
os.fdopen( fd, ... ) # get file object from fd
fd = os.open( path, ... ) # open file and return fd
```
#### The `open` Built-in Function
```python
f = open( file, mode='r', buffering=- 1, encoding=None,
errors=None, newline=None, closefd=True, opener=None )
## Modes
#
# r, rb -> read
# r+, r+b -> read/write
# a, ab -> append
# w, wb -> truncate & write
# w+, w+b -> truncate & read/write
```
###### Inputs
- **file** — path-like object (str or bytes path or `os.PathLike`)
- **encoding** — only specify for text mode
- "utf-8" recommended unless you know you need a different one
- default is platform-dependent (Ubuntu -> 'UTF-8')
- [library/codecs.html#standard-encodings](https://docs.python.org/3/library/codecs.html#standard-encodings)
###### Outputs
- raises `OSError`
- returns an instance of...
- raw binary -> `io.FileIO`
- buffered binary -> `io.BufferedReader` | `io.BufferedWriter` | `io.BufferedRandom`
- text -> io.`TextIOWrapper`
- file object acts as iterator and context manager
#### Path Wildcards
```
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any character not in seq
[?] match literal '?' (same with *)
```
See [fnmatch](https://docs.python.org/3/library/fnmatch.html) module for functions that work with patterns.
## io
```python
f = FileIO( "myfile.txt" ) # raw binary file object
f = StringIO( "some text" ) # in-memory text stream
f = BytesIO( b"some data" ) # in-memory buffered binary stream
exception UnsupportedOperation < OSError, ValueError
```
#### Class Hierarchy
```python
IOBase
RawIOBase > FileIO
BufferedIOBase > BytesIO, BufferedRandom, BufferedReader, BufferedRWPair, BufferedWriter
TextIOBase > StringIO, TextIOWrapper
```
`BufferedRWPair` provides access to two separate non-seekable `RawIOBase` raw binary streams — one readable and one writeable. Do not pass it the same object for both — use `BufferedRandom` instead.
#### Method Matrix
| Class | Stub Methods | Mixin Methods & Properties |
| ---------------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `IOBase` | `fileno`, `seek`, `truncate` | `close`, `closed`, `__enter__`, `__exit__`, `flush`, `isatty`, `__iter__`, `__next__`, `readable`, `readline`, `readlines`, `seekable`, `tell`, `writable`, `writelines` |
| `RawIOBase` | `readinto`, `write` | `read`, `readall` |
| `BufferedIOBase` | `detach`, `read`, `read1`, `write` | `readinto`, `readinto1` |
| `TextIOBase` | `detach`, `read`, `readline`, `write` | `encoding`, `errors`, `newlines` |
#### APIs
###### IOBase
```python
f.close()
f.closed
f.fileno() # file descriptor
f.flush()
f.isatty() # True if interactive
f.readable()
f.readline( size=-1 )
f.readlines( hint=-1 )
f.seek( offset, whence=os.SEEK_SET ) # whence...
# 0 == os.SEEK_SET == offset from beginning of file
# 1 == os.SEEK_CUR == offset from current position
# 2 == os.SEEK_END == offset from end of file
f.seekable()
f.tell() # text mode: return value is opaque
f.truncate( size=None )
f.writeable()
f.writelines( lines )
```
###### RawIOBase + FileIO
```python
# RawIOBase
f.read( size=-1 )
f.readall()
f.readinto( b ) # return bytes read (or None)
f.write( b ) # return bytes written (or None)
# FileIO < RawIOBase
f.mode
f.name
```
###### Buffered* + BytesIO
```python
# BufferedIOBase
f.detach() # separate and return underlying raw stream (RawIOBase instance)
# BufferedReader
f.peek( size=0 ) # return bytes without advancing position
# BytesIO
BytesIO( initial_bytes=b'' )
f.getbuffer()
f.getvalue()
```
###### Text* + StringIO
```python
# TextIOBase
f.encoding
f.errors
f.newlines
f.buffer
f.detach() # separate and return underlying binary buffer
# TextIOWrapper
f.line_buffering
f.write_through
f.seek( cookie, whence=os.SEEK_SET )
# if whence=os.SEEK_SET, cookie can be the opaque value return by tell()
# else, offset must be 0
# StringIO
StringIO( initial_value='', newline='\n' )
f.getvalue()
```
## pathlib
```python
p = Path( "." ) # returns PosixPath instance
p = Path( "foo", "bar" ) # "foo/bar"
p = Path( "/home/ofer/bin" ) # "/home/ofer/bin"
q = p / "foo.py" # "/home/ofer/bin/foo.py"
```
- `Path` objects are immutable, hashable, comparable, orderable, and satisfy the `os.PathLike` interface
- *pure* paths provide path computation without I/O
- *concrete* paths reference the filesystem and supports most single-file operations
- file names (without extensions) are called *stems*
- file extensions are called *suffixes*
- supports both absolute and relative paths
#### Class Hierarchy
```python
PurePath -> Path
-> PurePosixPath -> PosixPath <- Path
-> PureWindowsPath -> WindowsPath <- Path
```
#### Pure Operations
```python
p.as_uri() # "file:///..."
p.is_absolute()
p.is_relative_to( other ) # i.e., is "below" or "descended from"
p.match( pattern ) # True if path matches pattern
# abs patterns only work with abs paths
p.name # "bin" (filename with extension)
p.parent # returns Path of parent; if root, returns self
p.parents # immutable sequence (class:_PathParents)
# p.parents[ 0 ] -> Path( "/home/ofer" )
# p.parents[ 1 ] -> Path( "/home" )
# p.parents[ 2 ] -> Path( "/" )
p.parts # ( "/", "home", "ofer", "bin" )
p.relative_to( other ) # similar to str.removeprefix( other )
p.stem # name without suffix
p.suffix # like ".py", or "" if no suffix
p.suffixes # "foo.tar.gz" -> [ ".tar", ".gz" ]
p.with_name( name ) # new path with name replaced; like: p.parent / name
p.with_stem( stem ) # new path with stem replaced (ex: foo/bar.txt -> foo/baz.txt)
p.with_suffix( suffix ) # new path with suffix ('.x') replaced (ex: foo/bar.txt -> foo/bar.md)
```
#### Concrete Operations
###### Class Methods
```python
Path.cwd()
Path.home()
```
###### Single-File Operations
```python
p.absolute() # return new absolute version of path
# (without normalization or resolution)
p.chmod( mode, follow_symlinks=True ) # ex: chmod( 0o444 )
p.exists( follow_symlinks=True )
p.expanduser() # expand ~ like os.path.expanduser()
p.group() # returns group name
p.hardlink_to( target )
p.is_block_device()
p.is_char_device()
p.is_dir() # also True if p is a symlink to a dir
p.is_fifo()
p.is_file() # also True if p is a symlink to a file
p.is_mount() # basically checks if p and p.parent are on different devices
p.is_socket()
p.is_symlink()
p.lchmod( mode ) # like chmod(), but for symlink instead of target
p.lstat() # like stat(), but for symlink instead of target
p.mkdir( mode=0o777, parents=False, exist_ok=False )
p.open( mode='r', buffering=-1, encoding=None, errors=None, newline=None )
p.owner() # returns user name
p.read_bytes() # -> bytes
p.read_text( encoding=None, errors=None ) # -> string
p.readlink() # -> Path( link_target )
p.rename( target ) # returns new Path for target
# doesn't overwrite target on Windows
p.replace( target ) # like rename(), but behaves the same on all platforms
p.resolve( strict=False ) # return new normalized and resolved (symlinks) absolute path
# if strict, path must exist
p.rmdir() # dir must be empty
p.samefile( other ) # like os.path.samefile()/samestat()
p.stat( follow_symlinks=True ) # returns os.stat_result, like os.stat()
p.symlink_to( target )
p.touch( mode=0o666, exist_ok=True )
p.unlink( missing_ok=False )
p.write_bytes( data ) # returns num bytes written
p.write_text( data, encoding=None, errors=None, newline=None ) # returns num chars written
```
###### Multi-File Operations
```python
p.glob( pattern ) # returns generator that yields path objects
# pattern = same syntax as fnmatch module with the addition of “**” which means
# "this directory and all subdirectories, recursively" (ex: "**/*.py")
p.rglob( pattern ) # glob pattern recursively -- like glob() but with "**/" prepended to pattern
p.iterdir() # returns generator that yields path objects
# doesn't include '.' and '..'
# uses os.listdir()
p.walk( top_down=True, on_error=None, follow_symlinks=False )
# returns generator that yields tuples for each dir: ( dirpath, dirnames, filenames )
# if top_down is True, you can modify dirnames and walk() will only recurse
# into the subdirectories that remain in dirnames
# WARNING: if you follow symlinks, you could recurse infinitely
# uses os.scandir()
```
## os
#### Structures & Interfaces
###### os.PathLike
ABC w/abstract methods: `__fspath__()`
###### os.DirEntry
Implements `os.PathLike`.
```python
.name
.path
.inode()
.is_dir( follow_symlinks=True )
.is_file( follow_symlinks=True )
.is_symlink()
.stat( follow_symlinks=True )
```
###### os.stat_result
```python
st_mode # mode (file type and permissions)
st_ino # inode (unix) or file index (Win)
st_dev # ID of device storing the file
st_nlink # hard link count
st_uid # owner uid
st_gid # owner gid
st_size # in bytes
st_atime # accessed (s)
st_mtime # content modified (s)
st_ctime # metadata modified (s)
st_atime_ns # accessed (ns)
st_mtime_ns # content modified (ns)
st_ctime_ns # metadata modified (ns)
st_birthtime # created (s)
st_birthtime_ns # created (ns)
# Linux
st_blocks # count of allocated 512-byte blocks
st_blksize # "preferred" blocksize for efficient file system I/O
st_rdev # inode device type
st_flags # user-defined flags
```
###### os.statvfs_result
```python
f_bsize # 4096 fs block size
f_frsize # 4096 fragment size
f_blocks # 960081726 (fs size = frsize * blocks)
f_bfree # 543404462 num free blocks
f_bavail # 494616290 num free blocks for unprivileged users
f_files # 243924992 num inodes
f_ffree # 243305121 num free inodes
f_favail # 243305121 num free inodes for unprivileged users
f_flag # 4096 mount flags
f_namemax # 255 max filename length
```
[statvfs() Man Page](https://manpages.ubuntu.com/manpages/lunar/man3/statvfs.3.html)
#### File Operations
Only the ones not made obsolete by [pathlib](https://docs.python.org/3/library/pathlib.html).
```python
# opening
os.fdopen( fd, *args, **kwargs ) # turn fd into object
# other than fd arg, acts like built-in open:
# takes same args and returns file object
os.open( path, flags, mode=0o777, dir_fd=None ) # returns fd
# blocking modes
os.get_blocked( fd ) # False if O_NONBLOCK flag is set, True otherwise
os.set_blocking( fd, blocking ) # set O_NONBLOCK flag if blocking is False, else clear it
# file-locking
os.lockf( ... )
# efficient memory-to-memory copying
os.copy_file_range( ... ) # copies bytes between two file descriptors
os.splice( ... ) # move data between a pipe and a file descriptor
# modifying
os.truncate( path, length )
os.removedirs( name ) # remove dir plus every empty dir above it
os.utime( ... ) # set access & modified times
# syncing data to disk
os.fsync( fd ) # hint: do `os.fsync( f.fileno() )` *after* `f.flush()`
os.sync() # force write of everything to disk
# misc
os.chdir( path )
os.closerange( fd_low, fd_high ) # close an entire range of file descriptors
os.scandir( path='.' ) # returns context manager/iterator of os.DirEntry objects
# faster than os.listdir()/path.iterdir() if you need metadata
os.statvfs( path ) # -> statvfs_result (about the filesystem path lives on)
os.startfile( path[, operation][, arguments][, cwd][, show_cmd] ) # start with associated application
```
See more: [os.copy_file_range()](https://docs.python.org/3/library/os.html#os.copy_file_range), [os.lockf()](https://docs.python.org/3/library/os.html#os.lockf), [os.open()](https://docs.python.org/3/library/os.html#os.open), [os.splice()](https://docs.python.org/3/library/os.html#os.splice), [os.utime()](https://docs.python.org/3/library/os.html#os.utime)
Also: [open() Man Page](https://manpages.ubuntu.com/manpages/lunar/en/man3/open.3posix.html) (list of flags)
#### os.path
```python
os.path.commonpath( paths ) # returns longest common sub-path
os.path.commonprefix( paths ) # returns longest common path prefix
os.path.expandvars( path ) # return path with env vars ($name or ${name}) expanded
os.path.sameopenfile( fp1, fp2 ) # True if both file descriptors refer to the same file
```
## fileinput
```python
with fileinput.input( encoding="utf-8" ) as f:
for line in f:
process( line )
input( files=None, inplace=False, backup='', *, mode='r', openhook=None, encoding=None, errors=None )
# files -> path or list of paths; overrides default
# returned instance is also used as global state for remaining API
filename() # name of file currently being read or None before first line is read
fileno() # int file descriptor or -1
lineno() # cumulative line number
filelineno() # line number in current file
isfirstline() # relative to current file
isstdin()
nextfile() # like `continue` - skip rest of current file and move to next file
close()
```
Input lines include the newline.
## shutil
#### File Operations
```python
copyfileobj( fsrc, fdst[, length] ) # copy from one file-like object to another
copyfile( src, dst, follow_symlinks=True ) # copy a file's contents
copymode( src, dst, follow_symlinks=True ) # copy a file's permission bits
copystat( src, dst, follow_symlinks=True ) # copy a file's full metadata (bits, times, flags)
copy( src, dst, follow_symlinks=True ) # copy a file to another place (file or dir)
copys( src, dst, follow_symlinks=True ) # like copy(), but preserves metadata
move( src, dst, copy_function=copy2 )
# multi-file operations
copytree( src, dst, symlinks=False, ignore=None, copy_function=copy2, ignore_dangling_symlinks=False, dirs_exist_ok=False )
# recursively copy src dirtree (with metadata) to dst dir, creating intermediate dirs if necessary
rmtree( path, ignore_errors=False, onerror=None, *, onexc=None, dir_fd=None )
Error # collects exceptions during multi-file operations
# arg is list of tuples: [ ( srcname, dstname, exception ) ]
# misc
chown( path, user=None, group=None ) # names or ids
disk_usage( path ) # -> usage( total, used, free )
which( cmd, mode=os.F_OK|os.X_OK, path=None ) # like /usr/bin/which
```
#### Archives
Builds on [zipfile](https://docs.python.org/3/library/zipfile.html)/[tarfile](https://docs.python.org/3/library/tarfile.html) modules.
```python
get_archive_formats()
make_archive( base_name, format[, root_dir[, base_dir[, verbose[, dry_run[, owner[, group[, logger]]]]]]] )
register_archive_format( name, function[, extra_args[, description]] )
unregister_archive_format( name )
get_unpack_formats()
unpack_archive( filename[, extract_dir[, format[, filter]]] )
register_unpack_format( name, extensions, function[, extra_args[, description]] )
unregister_unpack_format( name )
```
## tempfile
#### Low-Level
```python
mkstemp( suffix=None, prefix=None, dir=None, text=False ) # securely create temp file
# no race conditions; only readable/writeable by current user; not inherited by child processes
# returns: ( file, path )
mkdtemp( suffix=None, prefix=None, dir=None ) # securely create temp dir
# no race conditions; only readable/writeable/searchable by current user
# returns: path
```
- `suffix` not automatically joined with `"."`
- `prefix` defaults to `gettempprefix()`
#### High-Level
```python
TemporaryFile( mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, *, errors=None )
# wrapper around mkstemp(); is file-like object; is context manager
NamedTemporaryFile( mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, delete=True, *, errors=None, delete_on_close=True )
# guaranteed to have a visible name; automatic deletion can be disable
SpooledTemporaryFile( max_size=0, mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, *, errors=None )
# adds some buffering
TemporaryDirectory( suffix=None, prefix=None, dir=None, ignore_cleanup_errors=False, *, delete=True )
# wrapper around mkdtemp(); is context manager
# has .name attribute
gettempdir() # "/tmp"
gettempdirb() # b"/tmp"
gettempprefix() # "tmp"
gettempprefixb() # b"tmp"
tempdir # "/tmp"
```