Why JSON isn’t a Good Configuration Language

Why JSON isn't a Good Configuration Language thumbnail

Why JSON isn’t a Good Configuration Language

Many projects use JSON for configuration files. Perhaps the most obvious example is the package.json file used by npm and yarn, but there are many others, including CloudFormation (originally JSON only, but now supports YAML as well) and composer (PHP).

However, JSON is actually a pretty terrible configuration language for a number of reasons. Don’t get me wrong — I like JSON. It is a flexible format that is relatively easy for both machines and humans to read, and it’s a pretty good data interchange and storage format. But as a configuration language, it falls short.

Why is JSON popular as a config language?

There are several reasons why JSON is used for configuration files. The biggest reason is probably that it is easy to implement. Many languages have JSON support in the standard library, and those that don’t almost certainly have an easy-to-use JSON package readily available. Then there is the fact that developers and users are probably already familiar with JSON and don’t need to learn a new configuration format to use the product. And that’s not to mention all the existing tooling for JSON, including syntax highlighting, auto-formatting, validation tools, etc.

These are actually all pretty good reasons. It’s too bad that this ubiquitous format is so ill-suited for configuration.

The problems with JSON

Lack of comments

One feature that is absolutely vital for a configuration language is comments. Comments are necessary to annotate what different options are for and why a particular value was chosen and—perhaps most importantly—to temporarily comment out parts of the config while using a different configuration for testing and debugging. If you think of JSON as a data interchange format, then it doesn’t really make sense to have comments.

There are, of course, workarounds for adding comments to JSON. One common workaround is to use a special key in an object for a comment, such as “//” or”__comment”. However, this syntax isn’t very readable, and in order to include more than one comment in a single object, you need to use unique keys for each. Douglas Crockford (the inventor of JSON) suggests using a preprocessor to remove comments. If you are using an application that requires JSON configuration, I recommend that you do just that, especially if you already have any kind of build step before the configuration is used. Of course that does add some additional work to editing configuration, so if you are creating an application that parses a configuration file, don’t depend on your users being able to use that.

Some JSON libraries do allow comments as input. For example, Ruby’s JSON module and the Java Jackson library with the JsonParser.Feature.ALLOW_COMMENTS feature enabled will handle JavaScript-style comments just fine in JSON input. However, this is non-standard, and many editors don’t properly handle comments in JSON files, which makes editing them a little harder.

Overly strict

The JSON specification is pretty restrictive. Its restrictiveness is part of what makes it easy to implement a JSON parser, but in my opinion, it also hurts the readability and, to a lesser extent, writability by humans.

Low Signal to Noise

Compared to many other configuration languages, JSON is pretty noisy. There is a lot of punctuation that doesn’t aid human readability, although it does make it easier to write implementations for machines. In particular, for configuration files, the keys in objects are almost always identifiers, so the quotation marks around the keys are redundant.

Also, JSON requires curly braces around the entire document, which is part of what makes it an (almost) subset of JavaScript and helps delimit different objects when multiple objects are sent over a stream. But, for a configuration file, the outermost braces are just useless clutter. The commas between key-value pairs are also mostly unnecessary in config files. Generally, you will have a single key-value pair per line, so it would make sense to accept a newline as a delimiter.

Speaking of commas, JSON doesn’t accept trailing commas. If you need commas after each pair, it should at least accept trailing commas, since trailing commas make adding new entries to the end easier and lead to cleaner commit diffs.

Long Strings

Another problem with JSON as a configuration format is it doesn’t have any support for multi-line strings. If you want newlines in the string, you have to escape them with “\n”, and what’s worse, if you want a string that carries over onto another line of the file, you are just out of luck. If your configuration doesn’t have any strings that are too long to fit on a line, this isn’t a problem. However, if your configuration includes long strings, such as the description of a project or a GPG key, you probably don’t want to put it on a single line with “\n” escapes instead of actual newlines.

Numbers

In addition, JSON’s definition of a number can be problematic in some scenarios. As defined in the JSON spec, numbers are arbitrary precision finite floating point numbers in decimal notation. For many applications, this is fine. But if you need to use hexadecimal notation or represent values like infinity or NaN, then TOML or YAML would be able to handle the input better.


{
  "name": "example",
  "description": "A really long description that needs multiple lines.\nThis is a sample project to illustrate why JSON is not a good configuration format. This description is pretty long, but it doesn't have any way to go onto multiple lines.",
  "version": "0.0.1",
  "main": "index.js",
  "//": "This is as close to a comment as you are going to get",
  "keywords": ["example", "config"],
  "scripts": {
    "test": "./test.sh",
    "do_stuff": "./do_stuff.sh"
  },
  "bugs": {
    "url": "https://example.com/bugs"
  },
  "contributors": [{
    "name": "John Doe",
    "email": "johndoe@example.com"
  }, {
    "name": "Ivy Lane",
    "url": "https://example.com/ivylane"
  }],
  "dependencies": {
    "dep1": "^1.0.0",
    "dep2": "3.40",
    "dep3": "6.7"
  }
}

What you should use instead

The configuration language you choose will depend on your application. Each language has different pros and cons, but here are some choices to consider. They are all languages that are designed for configuration first and would each be a better choice than a data language like JSON.

TOML

TOML is an increasingly popular configuration language. It is used by Cargo (Rust build tool), pip (Python package manager), and dep (golang dependency manager). TOML is somewhat similar to the INI format, but unlike INI, it has a standard specification and well-defined syntax for nested structures. It is substantially simpler than YAML, which is attractive if your configuration is fairly simple. But if your configuration has a significant amount of nested structure, TOML can be a little verbose, and another format, such as YAML or HOCON, may be a better choice.

name = "example"
description = """
A really long description that needs multiple lines.
This is a sample project to illustrate why JSON is not a \
good configuration format. This description is pretty long, \
but it doesn't have any way to go onto multiple lines."""

version = "0.0.1"
main = "index.js"
# This is a comment
keywords = ["example", "config"]

[bugs]
url = "https://example.com/bugs"

[scripts]

test = "./test.sh"
do_stuff = "./do_stuff.sh"

[[contributors]]
name = "John Doe"
email = "johndow@example.com"

[[contributors]]
name = "Ivy Lane"
url = "https://example.com/ivylane"

[dependencies]

dep1 = "^1.0.0"
# Why we depend on dep2
dep2 = "3.40"
dep3 = "6.7"

HJSON

HJSON is a format based on JSON but with greater flexibility to make it more readable. It adds support for comments, multi-line strings, unquoted keys and strings, and optional commas. If you want the simple structure of JSON but something more friendly for configuration files, HJSON is probably the way to go. There is also a command line tool that can convert HJSON to JSON, so if you are using a tool that requires plain JSON, you can write your configuration in HJSON and convert it to JSON as a build step. JSON5 is another option that is pretty similar to HJSON.

{
  name: example
  description: '''
  A really long description that needs multiple lines.
  
  This is a sample project to illustrate why JSON is 
  not a good configuration format.  This description 
  is pretty long, but it doesn't have any way to go 
  onto multiple lines.
  '''
  version: 0.0.1
  main: index.js
  # This is a a comment
  keywords: ["example", "config"]
  scripts: {
    test: ./test.sh
    do_stuff: ./do_stuff.sh
  }
  bugs: {
    url: https://example.com/bugs
  }
  contributors: [{
    name: John Doe
    email: johndoe@example.com
  } {
    name: Ivy Lane
    url: https://example.com/ivylane
  }]
  dependencies: {
    dep1: ^1.0.0
    # Why we have this dependency
    dep2: "3.40"
    dep3: "6.7"
  }
}

HOCON

HOCON is a configuration designed for the Play framework but is fairly popular among Scala projects. It is a superset of JSON, so existing JSON files can be used. Besides the standard features of comments, optional commas, and multi-line strings, HOCON supports importing from other files, referencing other keys of other values to avoid duplicate code, and using dot-delimited keys to specify paths to a value, so users do not have to put all values directly in a curly-brace object.

name = example
description = """
A really long description that needs multiple lines.

This is a sample project to illustrate why JSON is 
not a good configuration format.  This description 
is pretty long, but it doesn't have any way to go 
onto multiple lines.
"""
version = 0.0.1
main = index.js
# This is a a comment
keywords = ["example", "config"]
scripts {
  test = ./test.sh
  do_stuff = ./do_stuff.sh
}
bugs.url = "https://example.com/bugs"
contributors = [
  {
    name = John Doe
    email = johndoe@example.com
  }
  {
    name = Ivy Lane
    url = "https://example.com/ivylane"
  }
]
dependencies {
  dep1 = ^1.0.0
  # Why we have this dependency
  dep2 = "3.40"
  dep3 = "6.7"
}

YAML

YAML (YAML Ain’t Markup Language) is a very flexible format that is almost a superset of JSON and is used in several conspicuous projects such as Travis CI, Circle CI, and AWS CloudFormation. Libraries for YAML are almost as ubiquitous as JSON. In addition to support of comments, newline delimiting, multi-line strings, bare strings, and a more flexible type system, YAML also allows you to reference earlier structures in the file to avoid code duplication.

The main downside to YAML is that the specification is pretty complicated, which results in inconsistencies between different implementations. It also treats indentation levels as syntactically significant (similar to Python), which some people like and others don’t. It can also make copy and pasting tricky. See YAML: probably not so great after all for a more complete description of downsides to using YAML.

name: example
description: >
  A really long description that needs multiple lines.
  
  This is a sample project to illustrate why JSON is not a good 
  configuration format. This description is pretty long, but it 
  doesn't have any way to go onto multiple lines.
version: 0.0.1
main: index.js
# this is a comment
keywords:
  - example
  - config
scripts: 
  test: ./test.sh
  do_stuff: ./do_stuff.sh
bugs: 
  url: "https://example.com/bugs"
contributors:
  - name: John Doe
    email: johndoe@example.com
  - name: Ivy Lane
    url: "https://example.com/ivylange"
dependencies:
  dep1: ^1.0.0
  # Why we depend on dep2
  dep2: "3.40"
  dep3: "6.7"

Scripting language

If your application is written in a scripting language such as Python or Ruby, and you know the configuration comes from a trusted source, the best option may be to simply use a file written in that language for your configuration. It’s also possible to embed a scripting language such as Lua in compiled languages if you need a truly flexible configuration option. Doing so gives you the full flexibility of the scripting language and can be simpler to implement than using a different configuration language. The downside to using a scripting language is it may be too powerful, and of course, if the source of the configuration is untrusted, it introduces serious security problems.

Write your own

If for some reason a key-value configuration format doesn’t meet your needs, and you can’t use a scripting language due to performance or size constraints, then it might be appropriate to write your own configuration format. But if you find yourself in this scenario, think long and hard before making a choice that will not only require you to write and maintain a parser but also require your users to become familiar with yet another configuration format.

Conclusion

With so many better options for configuration languages, there’s no good reason to use JSON. If you are creating a new application, framework, or library that requires configuration choose something other than JSON.

Related material

14 Comments

  1. There is a format called EDN, created by Clojure available in Java and other JVM languages but shamefully not broadly known.

  2. Richard S.July 17, 2018 at 1:21 am

    There is also the Groovy configscript format which consumes Groovy scripts as configurations, so you have all the features of the Groovy language, plus the fact that it’s in the Groovy standard library and that you can configure the compiler that reads the script deserves a mention in my opinion: http://mrhaki.blogspot.com/2009/10/groovy-goodness-using-configslurper.html

  3. Jim WilliamsJuly 17, 2018 at 4:03 am

    XML is another choice for a configuration language. It shares many of the negative points of JSON like “signal to noise”. Also many of the positive points like available editors and parsers.

    Value is realized when XML is used with a data type document and a competent XML editor. In that case, it is a breeze to verify the file’s syntax. When a DTD is present many editors can prompt for elements and attributes, assisting the author of the config file.

  4. Jeff GroomJuly 17, 2018 at 7:46 am

    What format would you recommend to use in a lucidchart diagram? Currently, there is key/value for each drawing object. It would be nice if a yaml, hocon, or json could be associated to store a more complex set of data.

  5. Tyler DavisJuly 17, 2018 at 10:00 am

    Hey Jeff, I’m a developer on the data and automation team here at Lucidchart. I am interested in your use case and what type of data you are trying to visualize and attach to your shapes. Do you have time within the next week for a short phone call? You can grab some time on my calendar at https://calendly.com/lucidlaura/lucid-feature-dev. If you’re able to set aside some time, I’d love to say thank you by sending you a $25 Amazon gift card. Thanks for your help!

  6. […] JSON isn’t a good configuration language 6 by fanf2 | 1 comments on Hacker News. […]

  7. Dave CunninghamJuly 19, 2018 at 3:40 pm

    Jsonnet (jsonnet.org) is another option. It’s designed for configuration but is a full blown programming language with lots of construct for generating / abstracting config to avoid duplication, etc. It generates JSON so can be used with existing tools that accept JSON or YAML, like Cloud Formation, etc.

  8. One minor edit for correctness – the creator of JSON is Douglas Crockford not David.
    https://en.wikipedia.org/wiki/JSON

  9. Thayne McCombsJuly 27, 2018 at 9:57 am

    Fixed

  10. Definitely agree that using a full scripting langauge is far better than using configuration languages. In the rare case the configuration files are coming from an external source, yeah, use something that isn’t executable, but for everything else a full programming language is better than some configuration language that constrains you and that will be unfamiliar to most people.

  11. […] Why JSON isn’t a Good Configuration Language […]

  12. […] >> Why Json Isn’t A Good Configuration Language [lucidchart.com] […]

  13. You can just include comments in a json file like
    {
    “comment”:”This is express”,
    “text”:”tesr”
    }

  14. Thayne McCombsNovember 28, 2018 at 2:11 pm

    That works sometimes. But it has a lot of limitations:

    • It only works inside of objects. You can’t use it inside an array.
    • The comment can only be one line
    • If you need multiple comments in the same object, you have to use unique keys for each
    • If the application requires all keys to be known keywords, it will reject json with this kind of comment
    • If the object is a mapping from arbitrary keys to values, then this kind of comment would end up in your comment getting used as a value (or causing an error), which could mean comments are valid in some places, but not others.
    • The comment looks just like normal content, which makes it not stand out when reading the code

Your email address will not be published.