To make your intended result clear to Kubernetes, you need a language in which you can specify your resources declaratively.
Kubernetes uses the YAML markup language for this purpose. In software development, you have certainly already become familiar with markup languages such as XML or JSON, and perhaps you have even used YAML in a different context. We want to take this opportunity to go into more detail about YAML, because Kubernetes uses YAML to describe all resources and states of the cluster.
YAML is a recursive acronym and stands for YAML Ain't Markup Language. It is currently very popular alongside JSON and impresses with its significantly better readability for humans. But what are markup languages actually used for?
YAML was actually only intended to be a simple markup language, which is why the acronym was originally for Yet Another Markup Language. However, YAML has grown considerably and is, of course, a markup language despite its name.
If you search for markup languages, you will find different types of them. The best known is HTML, which allows you to structure and format text in such a way that a machine can read and interpret it. YAML provides a format to put data into a structure that is easy to read for both machines and humans.
YAML files have the extension .yaml, and sometimes you will also see .yml. Both are fine, but according to the documentation the .yaml extension should be used.
Basics of YAML Syntax
If you look at YAML files, you can break each of them down to three basic elements:
- Key-value pairs
- Lists
- Nested structures
Key-value pairs are the simplest form of data organization. Each pair consists of a key and an associated value. In the following example, you can map a person's data in this way:
name: "Kevin Welter"
company: "HumanITy GmbH."
You can use lists to define collections of elements. These are then grouped under a key. Each list element is indicated by a - sign. The following example shows a list of customer names.
customers:
- "Kevin Welter"
- "Sean Smith"
- "John Doe"
Sometimes you have more complex structures where individual lists and key-value pairs are not enough. You now have a list of names, but there is much more information about a customer. To map this information, you can use nested structures to define entire objects. For this purpose, you use key-value pairs and lists that are arranged in hierarchies.
You can create multiple YAML documents in one YAML file. These are separated by three dashes (---):
---
name: Kevin Welter
---
name: Sean Smith
In the first line of a YAML file, the dashes are optional, but they explicitly indicate that a new YAML document is starting.
In the following example, you have a list of customers who in turn have a company assigned to them. Both the customer and the company have a name, and there is further information about the company that can be entered in the substructure:
customers:
- name: "Kevin Welter"
company:
name: "HumanITy GmbH."
city: "Tucson"
zip: "85706"
- name: "Sean Smith"
company:
name: "Smith Inc."
city: "Fort Worth"
zip: "76040"
If you think in terms of objects, then you have the company object and you have the customer object. In this structure, the company belongs to the customer. In other cases, the customer could also be specified as a list of employees in the company:
company:
name: "HumanITy GmbH."
employees:
- "Kevin Welter"
- "Fabian Schaub"
As you can see, you have complete freedom to map your data as you or your system need it.
Indentations are of crucial importance in YAML. They define the hierarchy and structure of the data. In comparison, indentations are optional in JSON because the structure is defined by parentheses.
Indentations in YAML:
- must be consistent within a document,
- define the hierarchy of an element,
- often lead to errors or misinterpretations, and
- make the file easier to read.
In the examples, we have used an indentation of two spaces in each case. Most YAML parsers and editors support an indentation depth of two or four spaces by default. There is no right or wrong here, but you should remain consistent within a document. However, this is easier said than done with large YAML files. It has often happened to me that a key-value pair was not assigned to the correct object due to an incorrect indentation and we had to debug forever to find the error. An incorrect assignment is not a syntax error and therefore your editor will not directly point out the problem.
Note: Never use tabs to structure the indentations of a YAML file! YAML requires the use of spaces instead of tabs for indentation. The interpretation of tabs between different editors and environments can vary and therefore result in conflicts.
Data Types in YAML
In YAML, you can find all the classic data types that you also use in other programming languages:
string: "This is a string"
number: 123
float: 12.34
boolean: true
null value: null
With strings, you have several options for defining them. You can typically write a string without the quotation marks. YAML always tries to interpret the values correctly. However, if you want to use special characters such as :, ", or ' in the string, which are also used by YAML, then you absolutely need the quotation marks, as shown. It does not matter whether you use single (' ') or double (" ") quotation marks. The characters used in each case must not appear in the string itself, of course.
Note: You should follow a uniform convention within a YAML file. Always try to write a string in quotation marks because that makes it clearer.
YAML also provides the option of defining strings that run across several lines. By using the pipe (|) character, YAML retains the exact formatting, while > converts every line break into a space.
name: Kevin Welter
info: "Kevin says: 'Sometimes quotation marks are needed'"
simpleString: 'C:\Users\Kevin'
doubleString: "Line 1\nLine 2"
blockText: |
Text in multiple lines
Line 2
foldedText: >
This is a long
text broken across multiple lines for better
legibility, but separated by spaces
Like Kubernetes, we use camel case as a convention for the keys in YAML. However, YAML does not make any specifications here. You can even use spaces in a key. We recommend that you use the programming language for which you are using YAML as a guide. For example, use camelCase for Kubernetes, snake_case for Python, and so on.
Anchors and Aliases
Imagine a YAML file in which you define data that is repeated frequently. Suddenly your file has more than 1,000 lines. No matter how well structured YAML is, the file becomes more unreadable the larger it gets. For this purpose, YAML provides anchors and aliases that allow you to define objects or parameters once and use them again within the file according to the don't repeat yourself (DRY) principle.
An anchor is set using an &anchorName, and an alias references the anchor using *anchor- Name. You can see a simple example below.
favoriteNumber: &number 42
myFavoriteNumber: *number
You can also anchor the key-value pairs of an entire object and include them in another object. To do this, you need to use the syntax <<: *anchor, as shown below. The data from basicAuthor is transferred to specificAuthor. In this case, the subject area will remain in the specificAuthor Kubernetes, but the name will be transferred.
basicAutor: &author
name: "Kevin Welter"
specialty: YAML
specificAuthor:
<<: *author
specialty: Kubernetes
A useful real-life example where we use anchors again and again is in the GitLab pipeline tool. GitLab CI uses YAML to define pipelines, and there are many lines that are repeated over and over again. In the code below you can see a manifest as an example. The &script anchor is set here after the .launch key. In the devJob and prdJob objects, the anchor is referenced by <<: *script_launch references, and all key-value pairs are inserted at this point.
This has the following advantage: ykou only need to define the script once, and the environments are differentiated by parameterization.
.launch: &script
stage: launch
script:
- ./deploy.sh $ENV
devJob:
<<: *script
variables:
ENV: dev
prdJob:
<<: *script
variables:
ENV: prd
when: manual
Single-Line YAML Notation in Documentation
If you deal with the Kubernetes documentation, you will be confronted with a YAML notation from time to time, which we would like to briefly introduce here. Not only Kubernetes’s but also other documentation uses it: it is the single-line YAML notation.
You have already come across several manifests. The structure with lines and indentations makes a document easy to read, but if you want to refer to a specific key-value pair and include the complete hierarchy, you need a solution that saves space. An example of this is spec.containers[].resources.limits.cpu to reference the CPU limit listed in the code below.
Each . separates the levels of the hierarchy. This is similar to accessing nested object properties in many programming languages. The square brackets ([]) after containers indicate that it is a list. If you want to reference a specific entry in the list, you could also add an index in the parentheses.
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: my-image
resources:
limits:
cpu: "1"
Weaknesses of YAML
One of the main criticisms of YAML is the extensive specification, which covers a wide range of data types. This is very convenient in some situations because you don't necessarily have to put strings in quotation marks, for example, but it can lead to incorrect interpretations.
A well-known example of this is the Norway problem. The Norway problem is caused by a type inference weakness in YAML when processing character strings: the country code for Norway, NO, is incorrectly interpreted as a Boolean value. An example of this is shown below. If the country code NO is written in YAML without quotation marks, YAML will interpret it as False instead of the intended string, “NO”.
countries:
Sweden: SE
Norway: NO # This is interpreted as Boolean false
Finland: FI
Germany: DE
In the next listing, you will find further values that are interpreted by YAML as Booleans. The interesting thing is that in the latest YAML specification, 1.2, which was published in 2009, the Boolean values have been restricted to True and False. Nevertheless, the old specification remains in the libraries, and the Norway problem persists.
yes_value: yes # Is interpreted as True
no_value: no # Is interpreted as False
on_value: on # Is interpreted as True
off_value: off # Is interpreted as False
yes: y # Is interpreted as True
no: n # Is interpreted as False
Warning: Although the new YAML specification 1.2 only interprets the True or False values as Boolean, it can happen that libraries still use the old specification for parsing. For example, Kubernetes uses the go-yaml library to parse YAML manifests.
You will find an issue posting in which this topic has been discussed for years at the following address: http://s-prs.co/v596427.
To avoid this problem, you can always enclose strings in quotation marks. This means that there is no room for interpretation.
In addition to the Boolean problem, there are also other misinterpretations. You can see two examples of this below. The first is about port forwarding. For example, if you use the SSH port, YAML will turn the value 22:22 into a time. Of course, it has no problem with 80:80, as there is no corresponding time.
port-forwarding-ssh: 22:22 # Incorrectly interpreted as time
port-forwarding-nginx: 80:80 # Correctly interpreted as a character string
software-version: 1.1.0 # Correctly interpreted as a character string
database-version: 2.1 # Incorrectly interpreted as a floating point number
Version numbers can also cause problems. If you stick to semantic versioning and use three numbers in each case, you won't have a problem. However, it’s different if you only use two numbers, as with the database version. In that case, the number is interpreted as a float.
In general, it is best to write strings in quotation marks. This way you avoid any problems of misinterpretation. However, such a conflict occurs very rarely, and we have not yet had any critical issues because of it. If you do not import your YAML manifests directly in production, then in the worst case it could cost you some time in debugging. But as you have read this section, you will certainly remember the problem at this point.
Tips for Practical Use
We now want to give you a few useful tips that will hopefully make it easier for you to work with Kubernetes resources. In real life, you will be using YAML files all the time, so a good IDE or an editor with an appropriate add-on will save you a lot of headaches and time-consuming troubleshooting:
- Comments: The best thing about YAML compared to JSON is that you have the option to write comments. Especially with complex manifests, commentary is worth its weight in gold. You mark a comment using #, as in the following example:
name: "Kevin Welter" # Name of the author
- Linting tools: In addition to comments, we recommend using a linting tool that checks the syntax of the YAML manifest. It is best to check which one is recommended for your development environment, as there are several on the market, but in the end they all do what they are supposed to. The most important thing is that you don't have to search forever for an incorrect indentation as the linter points it out to you.
- Splitting files: As you know, you can integrate multiple documents into one YAML file. The recommendation is that you create one file per resource. For small applications, we sometimes use a single file. For larger applications, we always create a separate file for each Kubernetes object. There is no right or wrong here. Just see how it works best for you and how you can best keep an overview.
When developing, you should always make sure that you use a uniform indentation, a consistent naming convention, and anchors, because then even larger manifests will remain readable and you will enjoy writing them.
Editor’s note: This post has been adapted from a section of the book Kubernetes: Practical Guide for Developers and DevOps Teams by Kevin Welter.
Comments