# Python : Core : Files & Dirs [Built-in Functions](https://docs.python.org/3/library/functions.html) **Standard Library** - [fcntl](https://docs.python.org/3/library/fcntl.html) — interface to `fcntl()` and `ioctl()` Unix functions (TODO) - [fileinput](https://docs.python.org/3/library/fileinput.html) — *read all lines of all input files (Perl-style)* - [fnmatch](https://docs.python.org/3/library/fnmatch.html) — *syntax and low-level functions for shell-style path wildcards* - [glob](https://docs.python.org/3/library/glob.html) — *OBSOLETE; use `pathlib.Path.glob()`* - [io](https://docs.python.org/3/library/io.html) — *file classes* - [os](https://docs.python.org/3/library/os.html) — *low-level file I/O; partly replaced by pathlib* - [os.path](https://docs.python.org/3/library/os.path.html) — *mostly obsolete by pathlib* - [pathlib](https://docs.python.org/3/library/pathlib.html) — *`Path` classes* - [shutil](https://docs.python.org/3/library/shutil.html) — *high-level file handling, like working with trees* - [tempfile](https://docs.python.org/3/library/tempfile.html) ## Overview Three categories or modes: **(1)** raw binary **(2)** buffered binary **(3)** text **File Descriptor** An integer identifier that represents open files in the OS kernel. Usually, `stdin`/`stdout`/`stderr` are 0/1/2. **File Object (aka File-like Objects or Streams)** An object exposing a file-oriented API (like read() or write()) to an underlying resource. ([glossary](https://docs.python.org/3/glossary.html#term-file-object)) #### Preview ```python # managing the file object manually f = open( "infile" ) data = f.read() f.close() # using as context manager with open( "outfile", "w", encoding="utf-8" ) as f: f.write( data ) # reading input files with fileinput module for line in fileinput.input(): process( line ) # opening file descriptors (fds) and converting between fds and file objects f.fileno() # get fd from file object os.fdopen( fd, ... ) # get file object from fd fd = os.open( path, ... ) # open file and return fd ``` #### The `open` Built-in Function ```python f = open( file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None ) ## Modes # # r, rb -> read # r+, r+b -> read/write # a, ab -> append # w, wb -> truncate & write # w+, w+b -> truncate & read/write ``` ###### Inputs - **file** — path-like object (str or bytes path or `os.PathLike`) - **encoding** ­— only specify for text mode - "utf-8" recommended unless you know you need a different one - default is platform-dependent (Ubuntu -> 'UTF-8') - [library/codecs.html#standard-encodings](https://docs.python.org/3/library/codecs.html#standard-encodings) ###### Outputs - raises `OSError` - returns an instance of... - raw binary -> `io.FileIO` - buffered binary -> `io.BufferedReader` | `io.BufferedWriter` | `io.BufferedRandom` - text -> io.`TextIOWrapper` - file object acts as iterator and context manager #### Path Wildcards ``` * matches everything ? matches any single character [seq] matches any character in seq [!seq] matches any character not in seq [?] match literal '?' (same with *) ``` See [fnmatch](https://docs.python.org/3/library/fnmatch.html) module for functions that work with patterns. ## io ```python f = FileIO( "myfile.txt" ) # raw binary file object f = StringIO( "some text" ) # in-memory text stream f = BytesIO( b"some data" ) # in-memory buffered binary stream exception UnsupportedOperation < OSError, ValueError ``` #### Class Hierarchy ```python IOBase RawIOBase > FileIO BufferedIOBase > BytesIO, BufferedRandom, BufferedReader, BufferedRWPair, BufferedWriter TextIOBase > StringIO, TextIOWrapper ``` `BufferedRWPair` provides access to two separate non-seekable `RawIOBase` raw binary streams — one readable and one writeable. Do not pass it the same object for both — use `BufferedRandom` instead. #### Method Matrix | Class | Stub Methods | Mixin Methods & Properties | | ---------------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | `IOBase` | `fileno`, `seek`, `truncate` | `close`, `closed`, `__enter__`, `__exit__`, `flush`, `isatty`, `__iter__`, `__next__`, `readable`, `readline`, `readlines`, `seekable`, `tell`, `writable`, `writelines` | | `RawIOBase` | `readinto`, `write` | `read`, `readall` | | `BufferedIOBase` | `detach`, `read`, `read1`, `write` | `readinto`, `readinto1` | | `TextIOBase` | `detach`, `read`, `readline`, `write` | `encoding`, `errors`, `newlines` | #### APIs ###### IOBase ```python f.close() f.closed f.fileno() # file descriptor f.flush() f.isatty() # True if interactive f.readable() f.readline( size=-1 ) f.readlines( hint=-1 ) f.seek( offset, whence=os.SEEK_SET ) # whence... # 0 == os.SEEK_SET == offset from beginning of file # 1 == os.SEEK_CUR == offset from current position # 2 == os.SEEK_END == offset from end of file f.seekable() f.tell() # text mode: return value is opaque f.truncate( size=None ) f.writeable() f.writelines( lines ) ``` ###### RawIOBase + FileIO ```python # RawIOBase f.read( size=-1 ) f.readall() f.readinto( b ) # return bytes read (or None) f.write( b ) # return bytes written (or None) # FileIO < RawIOBase f.mode f.name ``` ###### Buffered* + BytesIO ```python # BufferedIOBase f.detach() # separate and return underlying raw stream (RawIOBase instance) # BufferedReader f.peek( size=0 ) # return bytes without advancing position # BytesIO BytesIO( initial_bytes=b'' ) f.getbuffer() f.getvalue() ``` ###### Text* + StringIO ```python # TextIOBase f.encoding f.errors f.newlines f.buffer f.detach() # separate and return underlying binary buffer # TextIOWrapper f.line_buffering f.write_through f.seek( cookie, whence=os.SEEK_SET ) # if whence=os.SEEK_SET, cookie can be the opaque value return by tell() # else, offset must be 0 # StringIO StringIO( initial_value='', newline='\n' ) f.getvalue() ``` ## pathlib ```python p = Path( "." ) # returns PosixPath instance p = Path( "foo", "bar" ) # "foo/bar" p = Path( "/home/ofer/bin" ) # "/home/ofer/bin" q = p / "foo.py" # "/home/ofer/bin/foo.py" ``` - `Path` objects are immutable, hashable, comparable, orderable, and satisfy the `os.PathLike` interface - *pure* paths provide path computation without I/O - *concrete* paths reference the filesystem and supports most single-file operations - file names (without extensions) are called *stems* - file extensions are called *suffixes* - supports both absolute and relative paths #### Class Hierarchy ```python PurePath -> Path -> PurePosixPath -> PosixPath <- Path -> PureWindowsPath -> WindowsPath <- Path ``` #### Pure Operations ```python p.as_uri() # "file:///..." p.is_absolute() p.is_relative_to( other ) # i.e., is "below" or "descended from" p.match( pattern ) # True if path matches pattern # abs patterns only work with abs paths p.name # "bin" (filename with extension) p.parent # returns Path of parent; if root, returns self p.parents # immutable sequence (class:_PathParents) # p.parents[ 0 ] -> Path( "/home/ofer" ) # p.parents[ 1 ] -> Path( "/home" ) # p.parents[ 2 ] -> Path( "/" ) p.parts # ( "/", "home", "ofer", "bin" ) p.relative_to( other ) # similar to str.removeprefix( other ) p.stem # name without suffix p.suffix # like ".py", or "" if no suffix p.suffixes # "foo.tar.gz" -> [ ".tar", ".gz" ] p.with_name( name ) # new path with name replaced; like: p.parent / name p.with_stem( stem ) # new path with stem replaced (ex: foo/bar.txt -> foo/baz.txt) p.with_suffix( suffix ) # new path with suffix ('.x') replaced (ex: foo/bar.txt -> foo/bar.md) ``` #### Concrete Operations ###### Class Methods ```python Path.cwd() Path.home() ``` ###### Single-File Operations ```python p.absolute() # return new absolute version of path # (without normalization or resolution) p.chmod( mode, follow_symlinks=True ) # ex: chmod( 0o444 ) p.exists( follow_symlinks=True ) p.expanduser() # expand ~ like os.path.expanduser() p.group() # returns group name p.hardlink_to( target ) p.is_block_device() p.is_char_device() p.is_dir() # also True if p is a symlink to a dir p.is_fifo() p.is_file() # also True if p is a symlink to a file p.is_mount() # basically checks if p and p.parent are on different devices p.is_socket() p.is_symlink() p.lchmod( mode ) # like chmod(), but for symlink instead of target p.lstat() # like stat(), but for symlink instead of target p.mkdir( mode=0o777, parents=False, exist_ok=False ) p.open( mode='r', buffering=-1, encoding=None, errors=None, newline=None ) p.owner() # returns user name p.read_bytes() # -> bytes p.read_text( encoding=None, errors=None ) # -> string p.readlink() # -> Path( link_target ) p.rename( target ) # returns new Path for target # doesn't overwrite target on Windows p.replace( target ) # like rename(), but behaves the same on all platforms p.resolve( strict=False ) # return new normalized and resolved (symlinks) absolute path # if strict, path must exist p.rmdir() # dir must be empty p.samefile( other ) # like os.path.samefile()/samestat() p.stat( follow_symlinks=True ) # returns os.stat_result, like os.stat() p.symlink_to( target ) p.touch( mode=0o666, exist_ok=True ) p.unlink( missing_ok=False ) p.write_bytes( data ) # returns num bytes written p.write_text( data, encoding=None, errors=None, newline=None ) # returns num chars written ``` ###### Multi-File Operations ```python p.glob( pattern ) # returns generator that yields path objects # pattern = same syntax as fnmatch module with the addition of “**” which means # "this directory and all subdirectories, recursively" (ex: "**/*.py") p.rglob( pattern ) # glob pattern recursively -- like glob() but with "**/" prepended to pattern p.iterdir() # returns generator that yields path objects # doesn't include '.' and '..' # uses os.listdir() p.walk( top_down=True, on_error=None, follow_symlinks=False ) # returns generator that yields tuples for each dir: ( dirpath, dirnames, filenames ) # if top_down is True, you can modify dirnames and walk() will only recurse # into the subdirectories that remain in dirnames # WARNING: if you follow symlinks, you could recurse infinitely # uses os.scandir() ``` ## os #### Structures & Interfaces ###### os.PathLike ABC w/abstract methods: `__fspath__()` ###### os.DirEntry Implements `os.PathLike`. ```python .name .path .inode() .is_dir( follow_symlinks=True ) .is_file( follow_symlinks=True ) .is_symlink() .stat( follow_symlinks=True ) ``` ###### os.stat_result ```python st_mode # mode (file type and permissions) st_ino # inode (unix) or file index (Win) st_dev # ID of device storing the file st_nlink # hard link count st_uid # owner uid st_gid # owner gid st_size # in bytes st_atime # accessed (s) st_mtime # content modified (s) st_ctime # metadata modified (s) st_atime_ns # accessed (ns) st_mtime_ns # content modified (ns) st_ctime_ns # metadata modified (ns) st_birthtime # created (s) st_birthtime_ns # created (ns) # Linux st_blocks # count of allocated 512-byte blocks st_blksize # "preferred" blocksize for efficient file system I/O st_rdev # inode device type st_flags # user-defined flags ``` ###### os.statvfs_result ```python f_bsize # 4096 fs block size f_frsize # 4096 fragment size f_blocks # 960081726 (fs size = frsize * blocks) f_bfree # 543404462 num free blocks f_bavail # 494616290 num free blocks for unprivileged users f_files # 243924992 num inodes f_ffree # 243305121 num free inodes f_favail # 243305121 num free inodes for unprivileged users f_flag # 4096 mount flags f_namemax # 255 max filename length ``` [statvfs() Man Page](https://manpages.ubuntu.com/manpages/lunar/man3/statvfs.3.html) #### File Operations Only the ones not made obsolete by [pathlib](https://docs.python.org/3/library/pathlib.html). ```python # opening os.fdopen( fd, *args, **kwargs ) # turn fd into object # other than fd arg, acts like built-in open: # takes same args and returns file object os.open( path, flags, mode=0o777, dir_fd=None ) # returns fd # blocking modes os.get_blocked( fd ) # False if O_NONBLOCK flag is set, True otherwise os.set_blocking( fd, blocking ) # set O_NONBLOCK flag if blocking is False, else clear it # file-locking os.lockf( ... ) # efficient memory-to-memory copying os.copy_file_range( ... ) # copies bytes between two file descriptors os.splice( ... ) # move data between a pipe and a file descriptor # modifying os.truncate( path, length ) os.removedirs( name ) # remove dir plus every empty dir above it os.utime( ... ) # set access & modified times # syncing data to disk os.fsync( fd ) # hint: do `os.fsync( f.fileno() )` *after* `f.flush()` os.sync() # force write of everything to disk # misc os.chdir( path ) os.closerange( fd_low, fd_high ) # close an entire range of file descriptors os.scandir( path='.' ) # returns context manager/iterator of os.DirEntry objects # faster than os.listdir()/path.iterdir() if you need metadata os.statvfs( path ) # -> statvfs_result (about the filesystem path lives on) os.startfile( path[, operation][, arguments][, cwd][, show_cmd] ) # start with associated application ``` See more: [os.copy_file_range()](https://docs.python.org/3/library/os.html#os.copy_file_range), [os.lockf()](https://docs.python.org/3/library/os.html#os.lockf), [os.open()](https://docs.python.org/3/library/os.html#os.open), [os.splice()](https://docs.python.org/3/library/os.html#os.splice), [os.utime()](https://docs.python.org/3/library/os.html#os.utime) Also: [open() Man Page](https://manpages.ubuntu.com/manpages/lunar/en/man3/open.3posix.html) (list of flags) #### os.path ```python os.path.commonpath( paths ) # returns longest common sub-path os.path.commonprefix( paths ) # returns longest common path prefix os.path.expandvars( path ) # return path with env vars ($name or ${name}) expanded os.path.sameopenfile( fp1, fp2 ) # True if both file descriptors refer to the same file ``` ## fileinput ```python with fileinput.input( encoding="utf-8" ) as f: for line in f: process( line ) input( files=None, inplace=False, backup='', *, mode='r', openhook=None, encoding=None, errors=None ) # files -> path or list of paths; overrides default # returned instance is also used as global state for remaining API filename() # name of file currently being read or None before first line is read fileno() # int file descriptor or -1 lineno() # cumulative line number filelineno() # line number in current file isfirstline() # relative to current file isstdin() nextfile() # like `continue` - skip rest of current file and move to next file close() ``` Input lines include the newline. ## shutil #### File Operations ```python copyfileobj( fsrc, fdst[, length] ) # copy from one file-like object to another copyfile( src, dst, follow_symlinks=True ) # copy a file's contents copymode( src, dst, follow_symlinks=True ) # copy a file's permission bits copystat( src, dst, follow_symlinks=True ) # copy a file's full metadata (bits, times, flags) copy( src, dst, follow_symlinks=True ) # copy a file to another place (file or dir) copys( src, dst, follow_symlinks=True ) # like copy(), but preserves metadata move( src, dst, copy_function=copy2 ) # multi-file operations copytree( src, dst, symlinks=False, ignore=None, copy_function=copy2, ignore_dangling_symlinks=False, dirs_exist_ok=False ) # recursively copy src dirtree (with metadata) to dst dir, creating intermediate dirs if necessary rmtree( path, ignore_errors=False, onerror=None, *, onexc=None, dir_fd=None ) Error # collects exceptions during multi-file operations # arg is list of tuples: [ ( srcname, dstname, exception ) ] # misc chown( path, user=None, group=None ) # names or ids disk_usage( path ) # -> usage( total, used, free ) which( cmd, mode=os.F_OK|os.X_OK, path=None ) # like /usr/bin/which ``` #### Archives Builds on [zipfile](https://docs.python.org/3/library/zipfile.html)/[tarfile](https://docs.python.org/3/library/tarfile.html) modules. ```python get_archive_formats() make_archive( base_name, format[, root_dir[, base_dir[, verbose[, dry_run[, owner[, group[, logger]]]]]]] ) register_archive_format( name, function[, extra_args[, description]] ) unregister_archive_format( name ) get_unpack_formats() unpack_archive( filename[, extract_dir[, format[, filter]]] ) register_unpack_format( name, extensions, function[, extra_args[, description]] ) unregister_unpack_format( name ) ``` ## tempfile #### Low-Level ```python mkstemp( suffix=None, prefix=None, dir=None, text=False ) # securely create temp file # no race conditions; only readable/writeable by current user; not inherited by child processes # returns: ( file, path ) mkdtemp( suffix=None, prefix=None, dir=None ) # securely create temp dir # no race conditions; only readable/writeable/searchable by current user # returns: path ``` - `suffix` not automatically joined with `"."` - `prefix` defaults to `gettempprefix()` #### High-Level ```python TemporaryFile( mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, *, errors=None ) # wrapper around mkstemp(); is file-like object; is context manager NamedTemporaryFile( mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, delete=True, *, errors=None, delete_on_close=True ) # guaranteed to have a visible name; automatic deletion can be disable SpooledTemporaryFile( max_size=0, mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, *, errors=None ) # adds some buffering TemporaryDirectory( suffix=None, prefix=None, dir=None, ignore_cleanup_errors=False, *, delete=True ) # wrapper around mkdtemp(); is context manager # has .name attribute gettempdir() # "/tmp" gettempdirb() # b"/tmp" gettempprefix() # "tmp" gettempprefixb() # b"tmp" tempdir # "/tmp" ```