YAML Explained: Syntax, Use Cases and How It Differs from JSON
You cannot avoid YAML if you work with Kubernetes, GitHub Actions, or Docker Compose. It looks simple until whitespace breaks your pipeline at 2 AM. Here is how it actually works.
What YAML is and why it exists
YAML (the name is a recursive joke: YAML Ain't Markup Language) is a data serialisation format designed for files that humans write and maintain. It trades JSON's strict, unambiguous syntax for something that is significantly easier to type and read β no braces, no brackets, no mandatory quotes around strings.
That trade-off shows up clearly in the DevOps world. Kubernetes manifests, GitHub Actions workflows, Docker Compose files, CircleCI configs, Ansible playbooks β all YAML. The reason is simple: these files get edited by humans dozens of times, committed to source control, and reviewed in pull requests. The readability advantage matters.
Where JSON wins is machine-generated data and API payloads. If your code is serialising an object to a file, JSON is harder to produce incorrectly. YAML's flexibility is its strength for authoring and its weakness for generation.
The building blocks
Mappings (key-value pairs)
A YAML mapping is the equivalent of a JSON object. Each key-value pair lives on its own line, separated by a colon and a space. That space after the colon is not optional.
server:
host: localhost
port: 5432
ssl: true
Nested structures use indentation β two spaces is the standard. Tabs are forbidden in YAML. This is the number-one reason YAML files break in editors that auto-insert tabs, and it is the reason YAML linting in CI pipelines is worth the setup time.
Sequences (lists)
A sequence is a JSON array written with dashes:
tags:
- dotnet
- blazor
- csharp
You can also write sequences inline using JSON-style brackets β [dotnet, blazor, csharp] β which some people find cleaner for short lists. Both are valid YAML.
Types and the Norway problem
YAML infers types automatically, which is mostly convenient and occasionally a nightmare. The classic example is the two-letter country code NO. In YAML 1.1 (which PyYAML and many other parsers use by default), NO is parsed as the boolean false. So does yes, on, and off. This has broken real systems β shipping code that stored country codes in YAML, only to find Norway missing from every dropdown.
# These will bite you in YAML 1.1 parsers
country: NO # parsed as false β WRONG
enabled: yes # parsed as true β maybe fine, but fragile
version: 1.0 # parsed as float, not string
# Quote values when type inference could be wrong
country: "NO"
version: "1.0"
YAML 1.2 restricts booleans strictly to true and false, but many tools still use 1.1 parsing. When in doubt, quote strings that could be misread.
Comments β the thing JSON should have
YAML supports # comments. This sounds minor until you spend time maintaining a complex Kubernetes deployment without them. Being able to write # this annotation is required by the cert-manager webhook next to a mysterious field is genuinely valuable:
database:
host: db.prod.internal # private VPC β not exposed externally
port: 5432
# pool size tuned for the t3.medium instance; revisit if instance type changes
max_connections: 20
Multi-line strings
Two special notations handle text that spans multiple lines:
# Literal block (|) β preserves every newline exactly
error_message: |
Something went wrong.
Please try again.
If the problem persists, contact support.
# Folded block (>) β joins lines with spaces, useful for long config values
description: >
This service handles payment processing and order
management for the checkout flow.
The folded block is particularly useful for long strings that would otherwise scroll off the right edge of the screen in your editor. The result behaves as a single line.
Anchors and aliases β stop repeating yourself
One YAML feature that JSON completely lacks: you can define a block once with an anchor (&) and reuse it anywhere with an alias (*). This is especially useful in CI pipeline configs where multiple jobs share the same environment variables:
shared_env: &shared_env
NODE_ENV: production
API_URL: https://api.example.com
LOG_LEVEL: info
deploy_staging:
<<: *shared_env
environment: staging
replicas: 1
deploy_production:
<<: *shared_env
environment: production
replicas: 3
The << key is a merge key β it pulls all the anchored fields into the current mapping. Any keys you define after the merge override the anchor's values. Without anchors you would be copying those three environment variables into every job definition and then updating them in multiple places whenever they change.
The mistakes that actually happen
After reading enough YAML-related pull request comments, a few mistakes come up repeatedly:
Mixing two-space and four-space indentation. Most YAML parsers require consistent indentation within a block, not necessarily consistent across the whole file, but editors often mix them when you copy-paste from somewhere else. YAML linters catch this; manual review often misses it.
Unquoted strings with colons. Any string that contains a colon followed by a space will be parsed as a key-value pair. This breaks in surprising ways:
# This will fail β the parser sees a nested mapping
title: Error: connection refused
# Quote it
title: "Error: connection refused"
Trusting that your YAML version is 1.2. PyYAML (the most common Python YAML library), the Ruby YAML module, and several Go parsers default to YAML 1.1 behaviour. Check your parser version and whether it offers a 1.2 mode before building systems that depend on strict boolean behaviour.
YAML vs JSON β the honest comparison
Use YAML for configuration files that developers maintain by hand. Use JSON for API responses, machine-generated data, and anywhere you want strict, unambiguous parsing. The choice is usually obvious from context β Kubernetes expects YAML, REST APIs return JSON. When you actually get to choose (say, a config file for your application), YAML wins for readability when comments and multi-line strings matter; JSON wins when you want a format that is harder to accidentally break.
If you need to move between them, any valid JSON document is also valid YAML β YAML is a superset. Going the other way, comments and anchors have no JSON equivalent and must be resolved before conversion.