Strings - Ofer Nave

# Python : Core : Strings - [reference/lexical_analysis](https://docs.python.org/3/reference/lexical_analysis.html) *(string and byte literals, including f-strings)* - [reference/datamodel](https://docs.python.org/3/reference/datamodel.html) *(`__bytes__`, `__format__`, `__repr__`, `__str__`)* - [library/functions](https://docs.python.org/3/library/functions.html) *(`bytearray()` `bytes()`, `format()`, `repr()`, `str()`)* - [library/stdtypes](https://docs.python.org/3/library/stdtypes.html) *(`string`/`bytearray`/`bytes` classes, style reference for `%` formatting operator)* - [library/string](https://docs.python.org/3/library/string.html) *(module API, Format String syntax, Format Spec syntax, `Template` strings)* ## String/Bytearray/Bytes API ```python # Both Types str.capitalize() # "foo bar" -> "Foo bar" str.center( width[, fillchar] ) str.count( substr[, start[, stop]] ) str.endswith( suffix[, start[, stop]] ) # suffix can also be a tuple of suffixes str.expandtabs( tabsize=8 ) str.find( substr[, start[, stop]] ) # returns index of first occurence, or -1 not found str.index( substr[, start[, stop]] ) # list find but raises ValueError if not found str.isalnum() # empty = False str.isalpha() # empty = False str.isascii() # empty = True str.isdigit() # empty = False str.islower() # empty = False str.isspace() # empty = False str.istitle() # empty = False str.isupper() # empty = False str.join( iterable ) # str is the separator str.ljust( width[, fillchar] ) str.lower() str.lstrip( [chars] ) # strips leading whitespace (or the characters in chars) str.partition( sep ) # -> ( first, sep, second ) if found else ( str, "", "" ) str.removeprefix( prefix ) str.removesuffix( suffix ) str.replace( old, new[, count] ) str.rfind( substr[, start[, stop]] ) # like find, but last occurrence str.rindex( substr[, start[, stop]] ) # like index, but last occurrence str.rjust( width[, fillchar] ) str.rpartition( sep ) # like partition, but last occurence str.rsplit( sep=None, maxsplit=-1 ) # like split, but from the right str.rstrip( [chars] ) # strips trailing whitespace (or chars) str.split( sep=None, maxsplit=-1 ) # if no sep, splits on whitespace str.splitlines( keepends=False ) str.startswith( prefix[, start[, stop]] ) # prefix can also be a tuple of prefixes str.strip( [chars] ) # strips leading & trailing whitespace (or chars) str.swapcase() str.title() # "foo bar" -> "Foo Bar" str.upper() str.zfill( width ) # "42".zfill( 5 ) -> "00042" # "-42".zfill( 5 ) -> "-0042"``` # ONLY binary types: bin.decode( encoding='utf-8', errors='strict' ) # opposite of str.encode() # ONLY strings: str.encode( encoding='utf-8', errors='strict' ) # opposite of bin.decode() str.format( *args, **kwargs ) # str can cointain literal text + replacement fields: {...} # ex: "The sum of 1 + 2 is {0}".format( 1+2 ) str.format_map( mapping ) # like str.format( **mapping ), but object is used directly str.isdecimal() # empty = False str.isidentifier() # see also: keyword.iskeyword() str.isnumeric() # empty = False str.isprintable() # empty = True ``` ## String Module ```python ascii_letters ascii_lowercase + ascii_uppercase ascii_lowercase 'abcdefghijklmnopqrstuvwxyz' ascii_uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' digits '0123456789' hexdigits '0123456789abcdefABCDEF' octdigits '01234567' punctuation '!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~' whitespace ' \t\n\r\x0b\x0c' printable digits + ascii_letters + punctuation + whitespace ``` ## Formatting #### History 1. Originally, there was the `%` formatting operator that acted on a formatting string (based on *printf* syntax) and accepted a value (scalar/tuple/mapping) to populate it: ```python # Field Syntax: %[(key)][flags][width][.precision][length][type] "Day: %03d" % days # -> "Day: 001" "%s, %s" % ( greeting, name ) # -> "Hello, Dave" "%(b)d %(a)d" % dict( a=42, b=57 ) # -> "57 42" ``` However, it only supported built-in types and was fragile and unpythonic. 2. Then came the `format()` syntax... ```python # Field Syntax: {[field_name][!conversion][:format_spec]} # Spec Syntax: [fill]align][sign]["z"]["#"]["0"][width][grouping_option]["." precision][type] "...{}...", "...{0}...", "...{foo}..." "...{foo:>03}...".format( foo=42 ) -> "...042..." ``` ...and family of functions... ```python # formatting a single value format( value, format_spec='' ) # built-in function object.__format__( format_spec ) # object method (overridden by many built-in types) # formatting an entire string (format string with replacement fields) str.format( *args, **kwargs ) ``` ...which delegated the formatting of individual values to that value's type — more flexible — but still required the fragile and awkward matching of fields by position and name. 3. Finally, f-strings (literals)... ```python # Field Syntax: {exp[=][!conversion][:format]} foo = 42 f"...{foo:>03}..." -> "...042..." f"...{foo=:>03}..." -> "...foo=042..." f"...{foo + 1:>03}..." -> "...043..." ``` ...which supported arbitrary Python expressions directly embedded in their respective positions within the string while also supporting the per-value spec syntax of the `format()` family, plus the convenient new `=` modifier. 4. The `strings` module also includes a `Template` class that offers very simple field replacement functionality which is safe to use with user-provided templates. #### 1. `%` Formatting Operator ```python # Field Syntax: %[(key)][flags][width][.precision][length][type # # % -> the string formatting or interpolation operator # flags -> # "alternate form" # 0 zero-pad numerics # - left-adjust # ' ' space before positive number # + include sign # type -> d/i/u signed integer decimal # o signed octal # x/X signed hex # e/E float w/exp format # f/F float w/dec formet # g/G float (uses dec or exp format as appropriate) # c single char # r/s/a string via repr()/str()/ascii() # % literal ``` #### 2. `format()` Family Each class can override `__format__` to control formatting, and even define its own spec mini-language. ###### Format API ```python format( value, format_spec='' ) # built-in function; used by f-strings and str.format(); # calls type( value ).__format__( value, format_spec ) object.__format__( format_spec ) # object method (overridden by many built-in types) # equiv to __str__ if format_spec == '' # raises TypeError if format_spec != '' str.format( *args, **kwargs ) # called on format string with replacement fields # which are populated by *args and **kwargs; # calls format() for each field with a spec ``` ###### Format Field Syntax ```python # {[field_name][!conversion][:format_spec]} # # field_name -> empty means next value in *args # numeric means position in *args # identifer means key in **kwargs # also: foo.bar, foo[47] # conversion -> r,s,a == repr/str/ascii # format_spec -> see below "hello {}".format( "Bob" ) # -> "hello Bob" "hello {name}".format( name="Bob") # -> "hello Bob" "{errno:x} error".format( errno=50159747054 ) # -> "0xbadc0ffee error" ``` ###### Format Spec Syntax ```python # [fill]align][sign]["z"]["#"]["0"][width][grouping_option]["." precision][type] # # fill -> any char; defaults to ' ' # align -> <, >, =, ^ (left is default except for numbers) (^ -> centered) # sign -> +, -, ' ' (+ -> always, - -> only for negative) # z, #, 0 # width -> digit+ # grouping_option -> _, ',' # precision -> digit+ # type -> str -> s (default) # int -> b, c, d (default), o, x/X, n (n -> d + locale) # float -> e/E, f/F, g/G (default), n, % (n -> g + locale) ``` #### 3. f-strings Cannot be used as docstrings. ```python # {exp[=][!conversion][:format_spec]} # # = self-documenting exp (includes surrounding whitespace) # ! !s -> str(), !r -> repr(), and !a -> ascii() # : calls format() with format_spec ``` Order of operations: 1. starting with *obj*, which is the result of *exp*: 2. if *conversion*, then `obj = conversion( obj )` 3. if *format_spec*, then `obj = format( obj, format_spec )` 4. if *obj* still isn't a `str`: - if specified `=` but not format_spec, then `obj = repr( obj )` - else `obj = str( obj )` #### 4. Template Strings ```python from string import Template t = Template( "Hey, $name!" ) t.substitute( name=name ) # -> "Hey, Bob!" ``` ## Regex #### Summary ```python re.fullmatch( A, B ) Matches A to entirety of B and returns match object or None. re.match( A, B ) Matches A to beginning of B and returns match object or None. re.search( A, B ) Matches first instance of A in B and returns match object or None. re.findall( A, B ) Matches all instances of A in B and returns list. re.split( A, B ) Split B into list using delimiter A. re.sub( A, B, C ) Replace A with B in C. re.compile( A ) -> pattern pattern.fullmatch/match/search/findall/split/sub( ... ) ``` #### API ```python re.compile( pattern, flags=0 ) -> Pattern re.search( pattern, string, flags=0 ) -> Match | None re.match( pattern, string, flags=0 ) -> Match | None re.fullmatch( pattern, string, flags=0 ) -> Match | None re.split( pattern, string, maxsplit=0, flags=0 ) -> [ ... ] re.findall( pattern, string, flags=0 ) -> [ ... ] # every non-overlapping instance # If there are no groups, return a list of strings matching the whole pattern. # If there is exactly one group, return a list of strings matching that group. # If multiple groups are present, return a list of tuples of strings matching the groups. re.finditer( pattern, string, flags=0 ) -> iter:Match # every non-overlapping instance re.sub( pattern, repl, string, count=0, flags=0 ) # Pattern same API as re without A param .pattern -> original pattern string .groups -> number of groups in pattern # Match match objects always evaluate to True match = re.match( r"(\w+) (\w+)", "Isaac Newton, physicist" ) match.group() # 'Isaac Newton' match.group( 0 ) # 'Isaac Newton' match.group( 1 ) # 'Isaac' match.group( 1, 2 ) # ( 'Isaac', 'Newton' ) match.groups() # ( 'Isaac', 'Newton' ) match[ ... ] # same as match.group( ... ) ``` #### Examples ```python findall( r'(\w+)=(\d+)', 'set width=20 and height=10' ) -> [ ( 'width', '20' ), ( 'height', '10' ) ] ```