Why YAML is used for configuration when it's so bad and what can you do about it?

YAML replaced JSON, which replaced XML as a format for configuration files. Looking at the transitions to learn how to deal with YAML right now.

I've been programming long enough that I still remember programming in XML. It was everywhere.

XML #

XML was the default when sending data between the client and the server. AJAX stands for "Asynchronous JavaScript and XML" for a reason. It was everywhere really. PHP projects, all kinds of Java projects. It was mostly fine when passing data, but people started to use it for configuration and from there to write logic.
For example Spring at the time used it for dependency injection. Hibernate used XML for the specification of the Object Relational Mapping between the database and Java objects. I've seen a bunch of internal projects using all kinds of logic. if, loop in a long complicated file with thousands of lines.
To some extent, you were able to deal with it when the company had an appropriate XML Schema. Your IDE could help you a bit. People abused XML so much that I heard complaints about XML all the time. I was complaining about XML as well. Let's be honest, working on the XML file so big that sometimes crashes your IDE isn't fun. I still remember a couple of senior engineers pointing out JSON faults and the benefits of XML and neither I nor anyone else listened to them. It was the beginning of the JSON era.

JSON #

JSON is pretty good for what it was made for—it's a lightweight (compared to XML) format for exchanging data between the server and JavaScript client in the browser.
One good thing about JSON is that it's pretty bad for configuration files. Yes, people tried it, but fortunately, it gets bad so quickly that I haven't seen anything too bad. Maybe beside Webpack configuration. Fortunately, it's not that common to see a big JSON configuration file. The biggest reason for that is the lack of comments in JSON format. It's a restriction that prevents people from abusing it too much.

But, if someone is already abusing JSON and has a hard time dealing with it they try to fix the problem with code. Some started working on JSON Schema others on new JSON format witch comments e. g.: HJSON.

It was obvious that JSON is not enough and we need something else for configuration.

YAML #

Various popular projects use it for configuration:

Now imagine you look at hundreds of lines of YAML trying to figure out what is wrong. If it's really a declarative configuration then while it's hard to read you don't have to worry that much. Just find the line setting that you need to change. The problem is when it's not a declarative configuration file, but a programming language pretending to be YAML. I start to think of something is a programming language when it gets if statements (conditionals). For example:

It's worse than your favorite programming language as well. Your IDE can't help you. Errors are cryptic. Most of the time you won't get an error at all because you didn't get the indentation just right and it's valid YAML that happens to be invalid configuration. It's a mess.

It's even worse than what XML had years ago. At least at a time, most IDEs were expected to help you with XML Schema. Something similar is showing up for YAML. See 10 YAML tips for people who hate YAML for advice on editor setup, linting, and more.

Another problem I want to mention again is readability. If you ever try to read a longer (hundred or more lines) YAML file you will realize that it's pretty hard to scan and make sense of its structure. Unlike Python that uses significant whitespace, you can't do any meaningful structured programming practices like extracting code into smaller functions. Everything is one blob of text.

No silver bullet #

One most important thing I want to stress is that I don't believe we can solve the problem by introducing another format. XML, JSON, and YAML are fine as long as the file itself is small and doesn't contain logic. So what do you do when you need to add logic? Use a proper programming language! Please stop treating logic in configuration files as a lesser code. If it's wrong then it's a serious problem. At least you will be able to write tests for your configuration. Have you ever tried doing that for your YAML file? I don't think so.


Notes (More reading):



Share on Hacker News
Share on LinkedIn


← Home