# Data : YAML
\[ [spec](https://yaml.org/spec/1.2.2/) | [refcard](https://yaml.org/refcard.html) ]
Human-friendly data serialization intended to be read and written in streams.
###### Basics
- latest version: `1.2.2` (superset of JSON)
- extension: `.yaml` (preferred), `.yml`
- media type: `application/yaml`
- space-based indentation is significant — tabs are illegal — blank lines are ignored
- commas (`,`) and colons (`:`) must be followed by whitespace
- string quoting is optional
- single-quoted strings only support one escape sequence: `''` -> `'`
- double-quotes strings support backslash escapes
- document header (`---`) is optional for single-document files/streams, but necessary to separate multiple documents
- document terminator (`...`) signals end-of-document without starting a new one
- comments supported anywhere, but `#` must be surrounded by whitespace
###### Multiline Strings
```yaml
| -> preserve newlines and whitespace (ignoring the indent)
> -> collapse newlines and whitespace other than double-newline (Markdown-style)
foo: |
Text will be
preserved
exactly as written.
bar: >
Text will be folded
into a single paragraph.
Blank lines denote paragraph breaks.
```
###### Scalar Types
- null — `~`, `null`, `Null`, `NULL`, or just leave value blank
- boolean — `true`, `True`, `TRUE`, `false`, `False`, `FALSE`
- integer & float
- supports E-notation (`1e+2`)
- supports octal (`07`) and hex (`0x0F`)
- supports separators (`123_456`)
- supports `.inf/.Inf/.INF/-.inf/-.Inf/-.INF` and `nan/NaN/NAN`
- timestamps — `1970-01-01`, `1970-01-01T12:34:56.1Z`, `1970-01-01T12:34:56.1-05:00`, `1970-01-01 12:34:56.1 -5`
###### Complex Types
```yaml
# list - multiline
- item1
- item2
# list - inline
[ item1, item2 ]
# hash - muiltline
key1: val1
key2: val2
# hash - inline
{ key1: val1, key2: val2 }
# multiline list of multiline hashes
- key1: val1
key2: val2
- key3: val3
key4: val4
# multline list of inline hashes
- { key1: val1, key2: val2 }
- { key3: val3, key4: val4 }
# inline list of inline hashes
[ { key1: val1, key2: val2 }, { key3: val3, key4: val4 } ]
# mutltiline hash or multiline lists
key1:
- item1
- item2
key2:
- item3
- item4
# multiline hash of inline lists
key1: [ item1, item2 ]
key2: [ item3, item4 ]
# inline hash of inline lists
{ key1: [ item1, item2 ], key2: [ item3, item4 ] }
```
###### Anchors & References
Repeated nodes are initially denoted by `&`, and thereafter referenced with `*`:
```yaml
foo: &myid <val> # "myid" is the anchor label
bar: *myid # <val> will be injected here
```
You can use an anchor reference to merge keys from one hash into another:
```yaml
foo: &myid { key1: val1 }
bar:
<<: *myid # << is called the "merge key"
key2: val2
```
> [!warning]
> Technically, "merge keys" were an extension to YAML 1.1, and are thus deprecated by 1.2. However, both PyYAML and Docker Compose continue to support them. \[ [spec](https://yaml.org/type/merge.html) | [discussion](https://stackoverflow.com/a/47203224/210867) ]
## Python
#### PyYAML
\[ [pypi](https://pypi.org/project/PyYAML/) | [src](https://github.com/yaml/pyyaml/) | [docs](https://pyyaml.org/wiki/PyYAMLDocumentation) ]
```python
from yaml import load, load_all, safe_load, safe_load_all
from yaml import dump, dump_all, safe_dump, safe_dump_all
try:
# C-based versions (requires libyaml, installed by default on Ubuntu 22.04)
from yaml import CLoader, CSafeLoader, CDumper, CSafeDumper
except ImportError:
from yaml import Loader, SafeLoader, Dumper, SafeDumper
# stream can be bytes, string, or open file
data = safe_load( stream ) # excepts single-document stream, else raises error
for data in safe_load_all( stream ):
...
# many formatting options
# explicit_start=True -> include document header (---)
string = safe_dump( data )
safe_dump( data, open_file ) # write string to open_file
string = safe_dump_all( list_or_generator )
safe_dump_all( list_or_generator, open_file )
# equivalencies
safe_load( ... ) == load( ..., loader=SafeLoader )
safe_load_all( ... ) == load_all( ..., loader=SafeLoader )
safe_dump( ... ) == dump( ..., dumper=SafeDumper )
safe_dump_all( ... ) == dump_all( ..., dumper=SafeDumper )
```
> [!warning]
> Not safe to call load/dump with untrusted data — can specify arbitrary Python object types.