# Data : YAML \[ [spec](https://yaml.org/spec/1.2.2/) | [refcard](https://yaml.org/refcard.html) ] Human-friendly data serialization intended to be read and written in streams. ###### Basics - latest version: `1.2.2` (superset of JSON) - extension: `.yaml` (preferred), `.yml` - media type: `application/yaml` - space-based indentation is significant — tabs are illegal — blank lines are ignored - commas (`,`) and colons (`:`) must be followed by whitespace - string quoting is optional - single-quoted strings only support one escape sequence: `''` -> `'` - double-quotes strings support backslash escapes - document header (`---`) is optional for single-document files/streams, but necessary to separate multiple documents - document terminator (`...`) signals end-of-document without starting a new one - comments supported anywhere, but `#` must be surrounded by whitespace ###### Multiline Strings ```yaml | -> preserve newlines and whitespace (ignoring the indent) > -> collapse newlines and whitespace other than double-newline (Markdown-style) foo: | Text will be preserved exactly as written. bar: > Text will be folded into a single paragraph. Blank lines denote paragraph breaks. ``` ###### Scalar Types - null — `~`, `null`, `Null`, `NULL`, or just leave value blank - boolean — `true`, `True`, `TRUE`, `false`, `False`, `FALSE` - integer & float - supports E-notation (`1e+2`) - supports octal (`07`) and hex (`0x0F`) - supports separators (`123_456`) - supports `.inf/.Inf/.INF/-.inf/-.Inf/-.INF` and `nan/NaN/NAN` - timestamps — `1970-01-01`, `1970-01-01T12:34:56.1Z`, `1970-01-01T12:34:56.1-05:00`, `1970-01-01 12:34:56.1 -5` ###### Complex Types ```yaml # list - multiline - item1 - item2 # list - inline [ item1, item2 ] # hash - muiltline key1: val1 key2: val2 # hash - inline { key1: val1, key2: val2 } # multiline list of multiline hashes - key1: val1 key2: val2 - key3: val3 key4: val4 # multline list of inline hashes - { key1: val1, key2: val2 } - { key3: val3, key4: val4 } # inline list of inline hashes [ { key1: val1, key2: val2 }, { key3: val3, key4: val4 } ] # mutltiline hash or multiline lists key1: - item1 - item2 key2: - item3 - item4 # multiline hash of inline lists key1: [ item1, item2 ] key2: [ item3, item4 ] # inline hash of inline lists { key1: [ item1, item2 ], key2: [ item3, item4 ] } ``` ###### Anchors & References Repeated nodes are initially denoted by `&`, and thereafter referenced with `*`: ```yaml foo: &myid <val> # "myid" is the anchor label bar: *myid # <val> will be injected here ``` You can use an anchor reference to merge keys from one hash into another: ```yaml foo: &myid { key1: val1 } bar: <<: *myid # << is called the "merge key" key2: val2 ``` > [!warning] > Technically, "merge keys" were an extension to YAML 1.1, and are thus deprecated by 1.2. However, both PyYAML and Docker Compose continue to support them. \[ [spec](https://yaml.org/type/merge.html) | [discussion](https://stackoverflow.com/a/47203224/210867) ] ## Python #### PyYAML \[ [pypi](https://pypi.org/project/PyYAML/) | [src](https://github.com/yaml/pyyaml/) | [docs](https://pyyaml.org/wiki/PyYAMLDocumentation) ] ```python from yaml import load, load_all, safe_load, safe_load_all from yaml import dump, dump_all, safe_dump, safe_dump_all try: # C-based versions (requires libyaml, installed by default on Ubuntu 22.04) from yaml import CLoader, CSafeLoader, CDumper, CSafeDumper except ImportError: from yaml import Loader, SafeLoader, Dumper, SafeDumper # stream can be bytes, string, or open file data = safe_load( stream ) # excepts single-document stream, else raises error for data in safe_load_all( stream ): ... # many formatting options # explicit_start=True -> include document header (---) string = safe_dump( data ) safe_dump( data, open_file ) # write string to open_file string = safe_dump_all( list_or_generator ) safe_dump_all( list_or_generator, open_file ) # equivalencies safe_load( ... ) == load( ..., loader=SafeLoader ) safe_load_all( ... ) == load_all( ..., loader=SafeLoader ) safe_dump( ... ) == dump( ..., dumper=SafeDumper ) safe_dump_all( ... ) == dump_all( ..., dumper=SafeDumper ) ``` > [!warning] > Not safe to call load/dump with untrusted data — can specify arbitrary Python object types.