These articles are written by Codalogic empowerees as a way of sharing knowledge with the programming community. They do not necessarily reflect the opinions of Codalogic.

LOON – Line Oriented Object Notation

By: Pete, February 2023

A while back I needed a configuration file format that I could easily manually edit. I wanted it to be general purpose in case my requirements grew, so I considered the obvious candidates. My conclusion was that XML [XML] has too many angled brackets, JSON [JSON] has too many quotation marks and YAML [YAML] has too many ways of doing the same thing.

I therefore set about seeing if I could improve on these. My requirements were: easy for a human to type and easy for a scripting language to parse. The latter suggested the format should be line oriented and rely mostly on functions like trim() / strip(), split() and regular expressions.

My starting point was JSON. But pure JSON has a number of deficiencies in my opinion. In particular, beyond too many quotation marks, it doesn't allow comments and doesn't support multi-line text well.

The result is what I call Line Oriented Object Notation or LOON [LOON].

To a first approximation LOON looks like JSON with the quotation marks removed and the commas replaced by new lines.

For example, the following JSON:

{
    "name" : "Pete",
    "userid" : 12,
    "privileges" : [
        {
            "directory" : "/home/pete",
            "permissions" : "rwxd"
        },
        {
            "directory" : "/bin",
            "permissions" : "rx"
        }
    ]
}

Looks like this in LOON:

# Pete's user profile
name : Pete
userid: 12
privileges [
    {
        directory: /home/pete
        permissions: rwxd
    }
    {
        directory: /bin
        permissions: rx
    }
]

As with JSON, LOON consists of values. values can be objects, arrays and primitive values. Arrays consist of values and objects consist of members which are name / value pairs. But there are a number of further differences.

No more than one value can appear on a line, although some kinds of value, such as objects and arrays, can take up multiple lines. The name of an object member must be on the same line as a value's first (and possibly, only) line. (A line with only a name is equivalent to the JSON "name" : null.) It is these constraints that allow LOON to be processed a line at a time and thus make it line oriented.

As you've probably surmised, object values start with a { character and end with a } character. Arrays start with a [ character and end with a ] character. The end characters must be the only character appearing on a line within an object or array. (If there are other non-whitespace characters it can be treated either as a string if in an array, a member with a string value if in an object or a syntax error.) In an array the start characters must be the only non-whitespace character appearing on the line and in an object they are preceded only by the member name (and whitespace) and no other non-whitespace.

This is illustrated in the example above. Notice that there is no : character between a member name and the object or array start characters. Unlike JSON, the : character separator is only used before string values. This disambiguates the grammar.

For example, the following is the start of an object member with an object value:

Policy {

Whereas the following is an object member with a string value consisting solely of the character {:

About : {

A member name is restricted to standard programming style identifier character sequences, plus the - and . characters (another difference to JSON). This allows names like com.codalogic.my-protocol.my-message. It's recommended that a name of this form is used as the first member name for LOON messages that are intended to be let loose on the Internet. E.g.:

com.codalogic.my-protocol.my-message {
    # Message content …
}

LOON does not differentiate between strings and numbers. Instead, it relies on a variant of duck typing. "If you want it to be a duck and it looks like a duck then treat it as a duck." Some might think this is a deficiency as the format loses information. But I believe explicitly knowing a value is a number as opposed to a string of numbers at the protocol level buys you very little advantage. For a number to be useful you need to know what it represents. For example, is it an age, a distance or a speed? If it is a distance, are the units metres, nanometres, miles or light years? Without the consumer of the number knowing these things the number is useless. And if you know these things then knowing it is a number is implicit. Additionally, there are other useful types that JSON doesn't distinguish, such as various dates and times, geo-coordinates and bearings. Plus, XML survives perfectly well without the distinction.

To make up for this, LOON has 3 types of string! Plain strings, quoted strings and multi-line strings.

Plain strings are expected to be the most common type of string. They just have the raw contents of the string all on one line. No escape sequences and no enclosing quotation marks. During parsing any leading and trailing whitespace of the value field is removed. For example:

Name: A plain string value

However, sometimes leading and trailing whitespace might be significant. Or you might want some escaping. In this case quoted strings can be used. Quoted strings have the same format as regular JSON strings. Quoted strings are also needed when use of a plain string would be mis-interpreted by the parser. For example, if you want to convey a string that starts with a quotation mark, you would need to wrap that string in a quoted string rather than a plain string. Quoted strings also need to be used in arrays to convey a single character string that is one of {}[] or a string that might otherwise look like a comment or the start of a multi-line string. It's hoped that the latter disambiguation scenarios will rarely be needed in typical use. An example quoted string is:

Name: "  A quoted string value "

Speaking of comments, a comment is a line whose first non-whitespace character is the # character. A comment cannot be on the same line as a value. For instance:

# A comment
Name: # A plain string value

Multi-line string values start with the << character sequence followed by an identifier and continue until the << character sequence and identifier are found again at the end of a line. An example of an object member with name History and a multi-line string value is as follows:

History <<END
    Born a long time again
    in a galaxy far, far away.<<END

Notice again there is no : character between the name History and the << characters.

In JSON a message can be any JSON value. This is often an object or array but can also be a primitive type like a number or Boolean. A LOON message can also be an object or array as in JSON, but unlike JSON, it can't be a string. Instead it can be an object body. Hence in the first example above, the LOON message does not need to begin with the start object { character.

One criticism of LOON is that it could use a lot of vertical space for an array of simple numbers or word-like tokens. To counter this I suggest making such tokens a space separated list of tokens stored in a single plain string value and then use a split() function to separate the string into tokens at the application level. As precedent, this technique is also used by XML Schama's [XSD] xs:list directive. For example, instead of:

Colours [
    Red
    Green
    Blue
]

Do:

Colours: Red Green Blue

In my opinion LOON has all the flexibility of JSON (and more) but it is stripped down and simplified so that there is not much more that can be taken away. It is also easy to implement both in scripting languages and lower-level languages (handling the quoted strings being the most intricate part).

This is a very quick run through of the LOON format. I have implemented Ruby and PHP parsers for my needs on GitHub [LOON], where you can also find additional information about LOON. At some point I hope to do a C# version. If you think LOON might be useful for you and you create your own LOON parser, please let me know and I can add a link to the GitHub page.

References

[JSON] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", RFC 7159, March 2014, http://www.rfc-editor.org/info/rfc7159

[LOON] https://github.com/codalogic/LOON

[XML] World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (Fifth Edition), ed. Tim Bray et al. W3C Recommendation 26 November 2008. http://www.w3.org/TR/xml/

[XSD] World Wide Web Consortium. XML Schema Version 1.1 Part 2: Datatypes, ed. Dave Peterson, Paul V. Biron and Ashok Malhotra, and C. M. Sperberg-McQueen W3C Recommendation 5 April 2012. http://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html

[YAML] https://yaml.org/

Keywords