# Python : Core : Strings
- [reference/lexical_analysis](https://docs.python.org/3/reference/lexical_analysis.html) *(string and byte literals, including f-strings)*
- [reference/datamodel](https://docs.python.org/3/reference/datamodel.html) *(`__bytes__`, `__format__`, `__repr__`, `__str__`)*
- [library/functions](https://docs.python.org/3/library/functions.html) *(`bytearray()` `bytes()`, `format()`, `repr()`, `str()`)*
- [library/stdtypes](https://docs.python.org/3/library/stdtypes.html) *(`string`/`bytearray`/`bytes` classes, style reference for `%` formatting operator)*
- [library/string](https://docs.python.org/3/library/string.html) *(module API, Format String syntax, Format Spec syntax, `Template` strings)*
## String/Bytearray/Bytes API
```python
# Both Types
str.capitalize() # "foo bar" -> "Foo bar"
str.center( width[, fillchar] )
str.count( substr[, start[, stop]] )
str.endswith( suffix[, start[, stop]] ) # suffix can also be a tuple of suffixes
str.expandtabs( tabsize=8 )
str.find( substr[, start[, stop]] ) # returns index of first occurence, or -1 not found
str.index( substr[, start[, stop]] ) # list find but raises ValueError if not found
str.isalnum() # empty = False
str.isalpha() # empty = False
str.isascii() # empty = True
str.isdigit() # empty = False
str.islower() # empty = False
str.isspace() # empty = False
str.istitle() # empty = False
str.isupper() # empty = False
str.join( iterable ) # str is the separator
str.ljust( width[, fillchar] )
str.lower()
str.lstrip( [chars] ) # strips leading whitespace (or the characters in chars)
str.partition( sep ) # -> ( first, sep, second ) if found else ( str, "", "" )
str.removeprefix( prefix )
str.removesuffix( suffix )
str.replace( old, new[, count] )
str.rfind( substr[, start[, stop]] ) # like find, but last occurrence
str.rindex( substr[, start[, stop]] ) # like index, but last occurrence
str.rjust( width[, fillchar] )
str.rpartition( sep ) # like partition, but last occurence
str.rsplit( sep=None, maxsplit=-1 ) # like split, but from the right
str.rstrip( [chars] ) # strips trailing whitespace (or chars)
str.split( sep=None, maxsplit=-1 ) # if no sep, splits on whitespace
str.splitlines( keepends=False )
str.startswith( prefix[, start[, stop]] ) # prefix can also be a tuple of prefixes
str.strip( [chars] ) # strips leading & trailing whitespace (or chars)
str.swapcase()
str.title() # "foo bar" -> "Foo Bar"
str.upper()
str.zfill( width ) # "42".zfill( 5 ) -> "00042"
# "-42".zfill( 5 ) -> "-0042"```
# ONLY binary types:
bin.decode( encoding='utf-8', errors='strict' ) # opposite of str.encode()
# ONLY strings:
str.encode( encoding='utf-8', errors='strict' ) # opposite of bin.decode()
str.format( *args, **kwargs )
# str can cointain literal text + replacement fields: {...}
# ex: "The sum of 1 + 2 is {0}".format( 1+2 )
str.format_map( mapping ) # like str.format( **mapping ), but object is used directly
str.isdecimal() # empty = False
str.isidentifier() # see also: keyword.iskeyword()
str.isnumeric() # empty = False
str.isprintable() # empty = True
```
## String Module
```python
ascii_letters ascii_lowercase + ascii_uppercase
ascii_lowercase 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
digits '0123456789'
hexdigits '0123456789abcdefABCDEF'
octdigits '01234567'
punctuation '!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~'
whitespace ' \t\n\r\x0b\x0c'
printable digits + ascii_letters + punctuation + whitespace
```
## Formatting
#### History
1. Originally, there was the `%` formatting operator that acted on a formatting string (based on *printf* syntax) and accepted a value (scalar/tuple/mapping) to populate it:
```python
# Field Syntax: %[(key)][flags][width][.precision][length][type]
"Day: %03d" % days # -> "Day: 001"
"%s, %s" % ( greeting, name ) # -> "Hello, Dave"
"%(b)d %(a)d" % dict( a=42, b=57 ) # -> "57 42"
```
However, it only supported built-in types and was fragile and unpythonic.
2. Then came the `format()` syntax...
```python
# Field Syntax: {[field_name][!conversion][:format_spec]}
# Spec Syntax: [fill]align][sign]["z"]["#"]["0"][width][grouping_option]["." precision][type]
"...{}...", "...{0}...", "...{foo}..."
"...{foo:>03}...".format( foo=42 ) -> "...042..."
```
...and family of functions...
```python
# formatting a single value
format( value, format_spec='' ) # built-in function
object.__format__( format_spec ) # object method (overridden by many built-in types)
# formatting an entire string (format string with replacement fields)
str.format( *args, **kwargs )
```
...which delegated the formatting of individual values to that value's type — more flexible — but still required the fragile and awkward matching of fields by position and name.
3. Finally, f-strings (literals)...
```python
# Field Syntax: {exp[=][!conversion][:format]}
foo = 42
f"...{foo:>03}..." -> "...042..."
f"...{foo=:>03}..." -> "...foo=042..."
f"...{foo + 1:>03}..." -> "...043..."
```
...which supported arbitrary Python expressions directly embedded in their respective positions within the string while also supporting the per-value spec syntax of the `format()` family, plus the convenient new `=` modifier.
4. The `strings` module also includes a `Template` class that offers very simple field replacement functionality which is safe to use with user-provided templates.
#### 1. `%` Formatting Operator
```python
# Field Syntax: %[(key)][flags][width][.precision][length][type
#
# % -> the string formatting or interpolation operator
# flags -> # "alternate form"
# 0 zero-pad numerics
# - left-adjust
# ' ' space before positive number
# + include sign
# type -> d/i/u signed integer decimal
# o signed octal
# x/X signed hex
# e/E float w/exp format
# f/F float w/dec formet
# g/G float (uses dec or exp format as appropriate)
# c single char
# r/s/a string via repr()/str()/ascii()
# % literal
```
#### 2. `format()` Family
Each class can override `__format__` to control formatting, and even define its own spec mini-language.
###### Format API
```python
format( value, format_spec='' ) # built-in function; used by f-strings and str.format();
# calls type( value ).__format__( value, format_spec )
object.__format__( format_spec ) # object method (overridden by many built-in types)
# equiv to __str__ if format_spec == ''
# raises TypeError if format_spec != ''
str.format( *args, **kwargs ) # called on format string with replacement fields
# which are populated by *args and **kwargs;
# calls format() for each field with a spec
```
###### Format Field Syntax
```python
# {[field_name][!conversion][:format_spec]}
#
# field_name -> empty means next value in *args
# numeric means position in *args
# identifer means key in **kwargs
# also: foo.bar, foo[47]
# conversion -> r,s,a == repr/str/ascii
# format_spec -> see below
"hello {}".format( "Bob" ) # -> "hello Bob"
"hello {name}".format( name="Bob") # -> "hello Bob"
"{errno:x} error".format( errno=50159747054 ) # -> "0xbadc0ffee error"
```
###### Format Spec Syntax
```python
# [fill]align][sign]["z"]["#"]["0"][width][grouping_option]["." precision][type]
#
# fill -> any char; defaults to ' '
# align -> <, >, =, ^ (left is default except for numbers) (^ -> centered)
# sign -> +, -, ' ' (+ -> always, - -> only for negative)
# z, #, 0
# width -> digit+
# grouping_option -> _, ','
# precision -> digit+
# type -> str -> s (default)
# int -> b, c, d (default), o, x/X, n (n -> d + locale)
# float -> e/E, f/F, g/G (default), n, % (n -> g + locale)
```
#### 3. f-strings
Cannot be used as docstrings.
```python
# {exp[=][!conversion][:format_spec]}
#
# = self-documenting exp (includes surrounding whitespace)
# ! !s -> str(), !r -> repr(), and !a -> ascii()
# : calls format() with format_spec
```
Order of operations:
1. starting with *obj*, which is the result of *exp*:
2. if *conversion*, then `obj = conversion( obj )`
3. if *format_spec*, then `obj = format( obj, format_spec )`
4. if *obj* still isn't a `str`:
- if specified `=` but not format_spec, then `obj = repr( obj )`
- else `obj = str( obj )`
#### 4. Template Strings
```python
from string import Template
t = Template( "Hey, $name!" )
t.substitute( name=name ) # -> "Hey, Bob!"
```
## Regex
#### Summary
```python
re.fullmatch( A, B ) Matches A to entirety of B and returns match object or None.
re.match( A, B ) Matches A to beginning of B and returns match object or None.
re.search( A, B ) Matches first instance of A in B and returns match object or None.
re.findall( A, B ) Matches all instances of A in B and returns list.
re.split( A, B ) Split B into list using delimiter A.
re.sub( A, B, C ) Replace A with B in C.
re.compile( A ) -> pattern
pattern.fullmatch/match/search/findall/split/sub( ... )
```
#### API
```python
re.compile( pattern, flags=0 ) -> Pattern
re.search( pattern, string, flags=0 ) -> Match | None
re.match( pattern, string, flags=0 ) -> Match | None
re.fullmatch( pattern, string, flags=0 ) -> Match | None
re.split( pattern, string, maxsplit=0, flags=0 ) -> [ ... ]
re.findall( pattern, string, flags=0 ) -> [ ... ] # every non-overlapping instance
# If there are no groups, return a list of strings matching the whole pattern.
# If there is exactly one group, return a list of strings matching that group.
# If multiple groups are present, return a list of tuples of strings matching the groups.
re.finditer( pattern, string, flags=0 ) -> iter:Match # every non-overlapping instance
re.sub( pattern, repl, string, count=0, flags=0 )
# Pattern
same API as re without A param
.pattern -> original pattern string
.groups -> number of groups in pattern
# Match
match objects always evaluate to True
match = re.match( r"(\w+) (\w+)", "Isaac Newton, physicist" )
match.group() # 'Isaac Newton'
match.group( 0 ) # 'Isaac Newton'
match.group( 1 ) # 'Isaac'
match.group( 1, 2 ) # ( 'Isaac', 'Newton' )
match.groups() # ( 'Isaac', 'Newton' )
match[ ... ] # same as match.group( ... )
```
#### Examples
```python
findall( r'(\w+)=(\d+)', 'set width=20 and height=10' ) -> [ ( 'width', '20' ), ( 'height', '10' ) ]
```