diff --git a/src/index.md b/src/index.md index 680aa9c..b42e35b 100644 --- a/src/index.md +++ b/src/index.md @@ -57,16 +57,16 @@ For more details, see the [overview below](#overview). [Play with it in your browser!](./play/) There's a living -[specification](https://github.com/kdl-org/kdl/blob/main/SPEC.md), as well as +[specification](/spec), as well as various [implementations](#implementations). You can also check out the [FAQ](#faq) to answer all your burning questions! The current version of the KDL spec is -[KDL 2.0.0](https://github.com/kdl-org/kdl/blob/2.0.0/SPEC.md). For legacy KDL, +[KDL 2.0.0](/spec). For legacy KDL, please refer to the [KDL 1.0.0 -spec](https://github.com/kdl-org/kdl/blob/2.0.0/SPEC_v1.md). All users are +spec](/spec-v1). All users are encouraged to migrate. [Migration is forward-and-backward-compatible and -safe](https://github.com/kdl-org/kdl/blob/2.0.0/SPEC.md#compatibility), and can +safe](/spec#compatibility), and can be automated. In addition to a spec for KDL itself, there are specifications for [a KDL Query diff --git a/src/spec-v1.md b/src/spec-v1.md new file mode 100644 index 0000000..05baabf --- /dev/null +++ b/src/spec-v1.md @@ -0,0 +1,551 @@ +--- +layout: base.html +title: KDL v1 Specification +--- + +# KDL v1 Spec + +This is the semi-formal specification for the legacy version of KDL, including +the intended data model and the grammar. + +This document describes KDL version `1.0.0`. It was released on September 11, 2021. + +Information in this spec is intended as both an accessible historical record, +and a reference for KDL implementors who are interested in supporting both major +versions of the language. + +The v1 spec will not receive further updates outside of minor, inconsequential +rewordings or other superficial fixes and is considered a "legacy" version. + +## Compatibility + +KDL v2 is designed such that for any given KDL document in either v1 or v2, the +parse will either fail completely, or, if the parse succeeds, the data +represented by a v1 or v2 parser will be identical. This means that it's safe to +use a fallback parsing strategy in order to support both v1 and v2 +simultaneously. For example, `node "foo"` is a valid node in both versions, and +should be represented identically by parsers. + +KDL v2 is designed such that for any given KDL document written as KDL +1.0 or [KDL 2.0](/spec), +the parse will either fail completely, or, if the +parse succeeds, the data represented by a v1 or v2 parser will be identical. +This means that it's safe to use a fallback parsing strategy in order to support +both v1 and v2 simultaneously. For example, `node "foo"` is a valid node in both +versions, and should be represented identically by parsers. + +A version marker `/- kdl-version 1` (or `2`) _MAY_ be added to the beginning of +a KDL document, optionally preceded by the BOM, and parsers _MAY_ use that as a +hint as to which version to parse the document as. + +## Introduction + +KDL is a node-oriented document language. Its niche and purpose overlaps with +XML, and as do many of its semantics. You can use KDL both as a configuration +language, and a data exchange or storage format, if you so choose. + +The bulk of this document is dedicated to a long-form description of all +[Components](#components) of a KDL document. There is also a much more terse +[Grammar](#full-grammar) at the end of the document that covers most of the +rules, with some semantic exceptions involving the data model. + +KDL is designed to be easy to read _and_ easy to implement. + +In this document, references to "left" or "right" refer to directions in the +*data stream* towards the beginning or end, respectively; in other words, +the directions if the data stream were only ASCII text. They do not refer +to the writing direction of text, which can flow in either direction, +depending on the characters used. + +## Components + +### Document + +The toplevel concept of KDL is a Document. A Document is composed of zero or +more [Nodes](#node), separated by newlines and whitespace, and eventually +terminated by an EOF. + +All KDL documents should be UTF-8 encoded and conform to the specifications in +this document. + +#### Example + +The following is a document composed of two toplevel nodes: + +```kdl +foo { + bar +} +baz +``` + +### Node + +Being a node-oriented language means that the real core component of any KDL +document is the "node". Every node must have a name, which is an +[Identifier](#identifier). + +The name may be preceded by a [Type Annotation](#type-annotation) to further +clarify its type, particularly in relation to its parent node. (For example, +clarifying that a particular `date` child node is for the _publication_ date, +rather than the last-modified date, with `(published)date`.) + +Following the name are zero or more [Arguments](#argument) or +[Properties](#property), separated by either [whitespace](#whitespace) or [a +slash-escaped line continuation](#line-continuation). Arguments and Properties +may be interspersed in any order, much like is common with positional +arguments vs options in command line tools. + +[Children](#children-block) can be placed after the name and the optional +Arguments and Properties, possibly separated by either whitespace or a +slash-escaped line continuation. + +Arguments are ordered relative to each other (but not relative to Properties) +and that order must be preserved in order to maintain the semantics. + +By contrast, Property order _SHOULD NOT_ matter to implementations. +[Children](#children-block) should be used if an order-sensitive key/value +data structure must be represented in KDL. + +Nodes _MAY_ be prefixed with `/-` to "comment out" the entire node, including +its properties, arguments, and children, and make it act as plain whitespace, +even if it spreads across multiple lines. + +Finally, a node is terminated by either a [Newline](#newline), a semicolon (`;`) +or the end of the file/stream (an `EOF`). + +#### Example + +```kdl +foo 1 key="val" 3 { + bar + (role)baz 1 2 +} +``` + +### Identifier + +An Identifier is either a [Bare Identifier](#bare-identifier), which is an +unquoted string like `node` or `item`, or a [String](#string), which is quoted, +like `"node"` or `"two words"`. There's no semantic difference between the +kinds of identifier; this simply allows for the use of quotes to have unusual +identifiers that are inexpressible as bare identifiers. + +### Bare Identifier + +A Bare Identifier is composed of any Unicode codepoint other than [non-initial +characters](#non-initial-characters), followed by any number of Unicode +codepoints other than [non-identifier characters](#non-identifier-characters), +so long as this doesn't produce something confusable for a [Number](#number), +[Boolean](#boolean), or [Null](#null). For example, both a [Number](#number) +and an Identifier can start with `-`, but when an Identifier starts with `-` +the second character cannot be a digit. This is precisely specified in the +[Full Grammar](#full-grammar) below. + +Identifiers are terminated by [Whitespace](#whitespace) or +[Newlines](#newline). + +### Non-initial characters + +The following characters cannot be the first character in a +[Bare Identifier](#identifier): + +* Any decimal digit (0-9) +* Any [non-identifier characters](#non-identifier-characters) + +Be aware that the `-` character can only be used as an initial +character if the second character is not a digit. This allows +identifiers to look like `--this`, and removes the ambiguity +of having an identifier look like a negative number. + +### Non-identifier characters + +The following characters cannot be used anywhere in a [Bare Identifier](#identifier): + +* Any codepoint with hexadecimal value `0x20` or below. +* Any codepoint with hexadecimal value higher than `0x10FFFF`. +* Any of `\/(){}<>;[]=,"` + +### Line Continuation + +Line continuations allow [Nodes](#node) to be spread across multiple lines. + +A line continuation is a `\` character followed by zero or more whitespace +characters and an optional single-line comment. It must be terminated by a +[Newline](#newline) (including the Newline that is part of single-line comments). + +Following a line continuation, processing of a Node can continue as usual. + +#### Example + +```kdl +my-node 1 2 \ // comments are ok after \ + 3 4 // This is the actual end of the Node. +``` + +### Property + +A Property is a key/value pair attached to a [Node](#node). A Property is +composed of an [Identifier](#identifier), followed immediately by a `=`, and then a [Value](#value). + +Properties should be interpreted left-to-right, with rightmost properties with +identical names overriding earlier properties. That is: + +```kdl +node a=1 a=2 +``` + +In this example, the node's `a` value must be `2`, not `1`. + +No other guarantees about order should be expected by implementers. +Deserialized representations may iterate over properties in any order and +still be spec-compliant. + +Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and +make it act as plain whitespace, even if it spreads across multiple lines. + +### Argument + +An Argument is a bare [Value](#value) attached to a [Node](#node), with no +associated key. It shares the same space as [Properties](#properties), and may be interleaved with them. + +A Node may have any number of Arguments, which should be evaluated left to +right. KDL implementations _MUST_ preserve the order of Arguments relative to +each other (not counting Properties). + +Arguments _MAY_ be prefixed with `/-` to "comment out" the entire token and +make it act as plain whitespace, even if it spreads across multiple lines. + +#### Example + +```kdl +my-node 1 2 3 "a" "b" "c" +``` + +### Children Block + +A children block is a block of [Nodes](#node), surrounded by `{` and `}`. They +are an optional part of nodes, and create a hierarchy of KDL nodes. + +Regular node termination rules apply, which means multiple nodes can be +included in a single-line children block, as long as they're all terminated by +`;`. + +#### Example + +```kdl +parent { + child1 + child2 +} + +parent { child1; child2; } +``` + +### Value + +A value is either: a [String](#string), a [Number](#number), a +[Boolean](#boolean), or [Null](#null). + +Values _MUST_ be either [Arguments](#argument) or values of +[Properties](#property). + +Values (both as arguments and as properties) _MAY_ be prefixed by a single +[Type Annotation](#type-annotation). + +### Type Annotation + +A type annotation is a prefix to any [Node Name](#node) or [Value](#value) that +includes a _suggestion_ of what type the value is _intended_ to be treated as, +or as a _context-specific elaboration_ of the more generic type the node name +indicates. + +Type annotations are written as a set of `(` and `)` with an +[Identifier](#identifier) in it. Any valid identifier is considered a valid +type annotation. There must be no whitespace between a type annotation and its +associated Node Name or Value. + +KDL does not specify any restrictions on what implementations might do with +these annotations. They are free to ignore them, or use them to make decisions +about how to interpret a value. + +Additionally, the following type annotations MAY be recognized by KDL parsers +and, if used, SHOULD interpret these types as follows: + +#### Reserved Type Annotations for Numbers Without Decimals: + +Signed integers of various sizes (the number is the bit size): + +* `i8` +* `i16` +* `i32` +* `i64` + +Unsigned integers of various sizes (the number is the bit size): + +* `u8` +* `u16` +* `u32` +* `u64` + +Platform-dependent integer types, both signed and unsigned: + +* `isize` +* `usize` + +#### Reserved Type Annotations for Numbers With Decimals: + +IEEE 754 floating point numbers, both single (32) and double (64) precision: + +* `f32` +* `f64` + +IEEE 754-2008 decimal floating point numbers + +* `decimal64` +* `decimal128` + +#### Reserved Type Annotations for Strings: + +* `date-time`: ISO8601 date/time format. +* `time`: "Time" section of ISO8601. +* `date`: "Date" section of ISO8601. +* `duration`: ISO8601 duration format. +* `decimal`: IEEE 754-2008 decimal string format. +* `currency`: ISO 4217 currency code. +* `country-2`: ISO 3166-1 alpha-2 country code. +* `country-3`: ISO 3166-1 alpha-3 country code. +* `country-subdivision`: ISO 3166-2 country subdivision code. +* `email`: RFC5322 email address. +* `idn-email`: RFC6531 internationalized email address. +* `hostname`: RFC1123 internet hostname (only ASCII segments) +* `idn-hostname`: RFC5890 internationalized internet hostname (only `xn--`-prefixed ASCII "punycode" segments, or non-ASCII segments) +* `ipv4`: RFC2673 dotted-quad IPv4 address. +* `ipv6`: RFC2373 IPv6 address. +* `url`: RFC3986 URI. +* `url-reference`: RFC3986 URI Reference. +* `irl`: RFC3987 Internationalized Resource Identifier. +* `irl-reference`: RFC3987 Internationalized Resource Identifier Reference. +* `url-template`: RFC6570 URI Template. +* `uuid`: RFC4122 UUID. +* `regex`: Regular expression. Specific patterns may be implementation-dependent. +* `base64`: A Base64-encoded string, denoting arbitrary binary data. + +#### Examples + +```kdl +node (u8)123 +node prop=(regex)".*" +(published)date "1970-01-01" +(contributor)person name="Foo McBar" +``` + +### String + +Strings in KDL represent textual [Values](#value), or unusual identifiers. A +String is either a [Quoted String](#quoted-string) or a +[Raw String](#raw-string). Quoted Strings may include escaped characters, while +Raw Strings always contain only the literal characters that are present. + +### Quoted String + +A Quoted String is delimited by `"` on either side of any number of literal +string characters except unescaped `"` and `\`. This includes literal +[Newline](#newline) characters, which means a String Value can encompass +multiple lines without behaving like a Newline for [Node](#node) parsing +purposes. + +Strings _MUST_ be represented as UTF-8 values. + +In addition to literal code points, a number of "escapes" are supported. +"Escapes" are the character `\` followed by another character, and are +interpreted as described in the following table: + +| Name | Escape | Code Pt | +|-------------------------------|--------|----------| +| Line Feed | `\n` | `U+000A` | +| Carriage Return | `\r` | `U+000D` | +| Character Tabulation (Tab) | `\t` | `U+0009` | +| Reverse Solidus (Backslash) | `\\` | `U+005C` | +| Solidus (Forwardslash) | `\/` | `U+002F` | +| Quotation Mark (Double Quote) | `\"` | `U+0022` | +| Backspace | `\b` | `U+0008` | +| Form Feed | `\f` | `U+000C` | +| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, up to `10FFFF` | + +### Raw String + +Raw Strings in KDL are much like [Quoted Strings](#quoted-string), except they +do not support `\`-escapes. They otherwise share the same properties as far as +literal [Newline](#newline) characters go, and the requirement of UTF-8 +representation. + +Raw String literals are represented as `r`, followed by zero or more `#` +characters, followed by `"`, followed by any number of UTF-8 literals. The +string is then closed by a `"` followed by a _matching_ number of `#` +characters. This allows them to contain raw `"` or `#` characters; only the +precise terminator (resembling `"##`, for example) ends the raw string. This +means that the string sequence `"` or `"#` and such must not match the closing +`"` with the same or more `#` characters as the opening `r`. + +#### Example + +```kdl +just-escapes r"\n will be literal" +quotes-and-escapes r#"hello\n\r\asd"world"# +``` + +### Number + +Numbers in KDL represent numerical [Values](#value). There is no logical distinction in KDL +between real numbers, integers, and floating point numbers. It's up to +individual implementations to determine how to represent KDL numbers. + +There are four syntaxes for Numbers: Decimal, Hexadecimal, Octal, and Binary. + +* All numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative. +* Binary numbers start with `0b` and only allow `0` and `1` as digits, which may be separated by `_`. They represent numbers in radix 2. +* Octal numbers start with `0o` and only allow digits between `0` and `7`, which may be separated by `_`. They represent numbers in radix 8. +* Hexadecimal numbers start with `0x` and allow digits between `0` and `9`, as well as letters `A` through `F`, in either lower or upper case, which may be separated by `_`. They represent numbers in radix 16. +* Decimal numbers are a bit more special: + * They have no radix prefix. + * They use digits `0` through `9`, which may be separated by `_`. + * They may optionally include a decimal separator `.`, followed by more digits, which may again be separated by `_`. + * They may optionally be followed by `E` or `e`, an optional `-` or `+`, and more digits, to represent an exponent value. + +### Boolean + +A boolean [Value](#value) is either the symbol `true` or `false`. These +_SHOULD_ be represented by implementation as boolean logical values, or some +approximation thereof. + +#### Example + +```kdl +my-node true value=false +``` + +### Null + +The symbol `null` represents a null [Value](#value). It's up to the +implementation to decide how to represent this, but it generally signals the +"absence" of a value. It is reasonable for an implementation to ignore null +values altogether when deserializing. + +#### Example + +```kdl +my-node null key=null +``` + +### Whitespace + +The following characters should be treated as non-[Newline](#newline) [white +space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt): + +| Name | Code Pt | +|----------------------|---------| +| Character Tabulation | `U+0009` | +| Space | `U+0020` | +| No-Break Space | `U+00A0` | +| Ogham Space Mark | `U+1680` | +| En Quad | `U+2000` | +| Em Quad | `U+2001` | +| En Space | `U+2002` | +| Em Space | `U+2003` | +| Three-Per-Em Space | `U+2004` | +| Four-Per-Em Space | `U+2005` | +| Six-Per-Em Space | `U+2006` | +| Figure Space | `U+2007` | +| Punctuation Space | `U+2008` | +| Thin Space | `U+2009` | +| Hair Space | `U+200A` | +| Narrow No-Break Space| `U+202F` | +| Medium Mathematical Space | `U+205F` | +| Ideographic Space | `U+3000` | + +#### Multi-line comments + +In addition to single-line comments using `//`, comments can also be started +with `/*` and ended with `*/`. These comments can span multiple lines. They +are allowed in all positions where [Whitespace](#whitespace) is allowed and +can be nested. + +### Newline + +The following characters [should be treated as new +lines](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G41643): + +| Acronym | Name | Code Pt | +|---------|-----------------|---------| +| CRLF | Carriage Return and Line Feed | `U+000D` + `U+000A` | +| CR | Carriage Return | `U+000D` | +| LF | Line Feed | `U+000A` | +| NEL | Next Line | `U+0085` | +| FF | Form Feed | `U+000C` | +| LS | Line Separator | `U+2028` | +| PS | Paragraph Separator | `U+2029` | + +Note that for the purpose of new lines, CRLF is considered _a single newline_. `VT` `Vertical tab` `U+000B` was mistakenly excluded, but the v1 spec if frozen, so it's left unchanged. + +## Full Grammar + +``` +nodes := linespace* (node nodes?)? linespace* + +node := ('/-' node-space*)? type? identifier (node-space+ node-prop-or-arg)* (node-space* node-children ws*)? node-space* node-terminator +node-prop-or-arg := ('/-' node-space*)? (prop | value) +node-children := ('/-' node-space*)? '{' nodes '}' +node-space := ws* escline ws* | ws+ +node-terminator := single-line-comment | newline | ';' | eof + +identifier := string | bare-identifier +bare-identifier := ((identifier-char - digit - sign) identifier-char* | sign ((identifier-char - digit) identifier-char*)?) - keyword +identifier-char := unicode - linespace - [\/(){}<>;[]=,"] +keyword := boolean | 'null' +prop := identifier '=' value +value := type? (string | number | keyword) +type := '(' identifier ')' + +string := raw-string | escaped-string +escaped-string := '"' character* '"' +character := '\' escape | [^\"] +escape := ["\\/bfnrt] | 'u{' hex-digit{1, 6} '}' +hex-digit := [0-9a-fA-F] + +raw-string := 'r' raw-string-hash +raw-string-hash := '#' raw-string-hash '#' | raw-string-quotes +raw-string-quotes := '"' .* '"' + +number := hex | octal | binary | decimal + +decimal := sign? integer ('.' integer)? exponent? +exponent := ('e' | 'E') sign? integer +integer := digit (digit | '_')* +digit := [0-9] +sign := '+' | '-' + +hex := sign? '0x' hex-digit (hex-digit | '_')* +octal := sign? '0o' [0-7] [0-7_]* +binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* + +boolean := 'true' | 'false' + +escline := '\\' ws* (single-line-comment | newline) + +linespace := newline | ws | single-line-comment + +newline := See Table (All line-break white_space) + +ws := bom | unicode-space | multi-line-comment + +bom := '\u{FEFF}' + +unicode-space := See Table (All White_Space unicode characters which are not `newline`) + +single-line-comment := '//' ^newline+ (newline | eof) +multi-line-comment := '/*' commented-block +commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block +``` + + diff --git a/static/spec.html b/static/spec.html new file mode 100644 index 0000000..10331bb --- /dev/null +++ b/static/spec.html @@ -0,0 +1,3133 @@ + + + + + + +The KDL Document Language + + + + + + + + + + + + + + + + + + + + + + + + + + + +
KDLJanuary 2025
Marchán & KDL ContributorsExperimental[Page]
+
+
+
+
Workgroup:
+
KDL Community
+
Published:
+
+ +
+
Authors:
+
+
+
K. Marchán
+
Microsoft
+
+
+
KDL Contributors
+
+
+
+
+

The KDL Document Language

+
+

Abstract

+

KDL is a node-oriented document language. Its niche and purpose overlaps with +XML, and as do many of its semantics. You can use KDL both as a configuration +language, and a data exchange or storage format, if you so choose.

+

This is the formal specification for KDL, including the intended data model and +the grammar.

+

This document describes KDL version KDL 2.0.0. It was released on 2024-12-21. It +is the latest stable version of the language, and will only be edited for minor +copyedits or major errata.

+
+
+

+About This Document +

+

This note is to be removed before publishing as an RFC.

+

+ Status information for this document may be found at https://datatracker.ietf.org/doc/draft-marchan-kdl2/.

+

+ information can be found at https://kdl.dev/.

+

Source for this draft and an issue tracker can be found at + https://github.com/kdl-org/kdl.

+
+
+

+License +

+

This work is licensed under Creative Commons Attribution-ShareAlike 4.0 +International. To view a copy of this license, visit +https://creativecommons.org/licenses/by-sa/4.0/

+
+
+
+

+Table of Contents +

+ +
+
+
+
+

+1. Compatibility +

+

KDL 2.0 is designed such that for any given KDL document written as KDL +1.0 or KDL 2.0, the parse will either fail completely, or, if the +parse succeeds, the data represented by a v1 or v2 parser will be identical. +This means that it's safe to use a fallback parsing strategy in order to support +both v1 and v2 simultaneously. For example, node "foo" is a valid node in both +versions, and should be represented identically by parsers.

+

A version marker /- kdl-version 2 (or 1) MAY be added to the beginning of +a KDL document, optionally preceded by the BOM, and parsers MAY use that as a +hint as to which version to parse the document as.

+
+
+
+
+

+2. Introduction +

+

KDL is a node-oriented document language. Its niche and purpose overlaps with +XML, and as do many of its semantics. You can use KDL both as a configuration +language, and a data exchange or storage format, if you so choose.

+

The bulk of this document is dedicated to a long-form description of all +Components (Section 3) of a KDL document. +There is also a much more terse +Grammar (Section 4) at the end of the document that covers most of the +rules, with some semantic exceptions involving the data model.

+

KDL is designed to be easy to read and easy to implement.

+

In this document, references to "left" or "right" refer to directions in the +data stream towards the beginning or end, respectively; in other words, +the directions if the data stream were only ASCII text. They do not refer +to the writing direction of text, which can flow in either direction, +depending on the characters used.

+
+
+
+
+

+3. Components +

+
+
+

+3.1. Document +

+

The toplevel concept of KDL is a Document. A Document is composed of zero or +more Nodes (Section 3.2), separated by newlines and whitespace, and eventually +terminated by an EOF.

+

All KDL documents should be UTF-8 encoded and conform to the specifications in +this document.

+
+
+

+3.1.1. Example +

+

The following is a document composed of two toplevel nodes:

+
+
+foo {
+    bar
+}
+baz
+
+
+
+
+
+
+
+
+

+3.2. Node +

+

Being a node-oriented language means that the real core component of any KDL +document is the "node". Every node must have a name, which must be a +String (Section 3.9).

+

The name may be preceded by a Type Annotation (Section 3.8) to further +clarify its type, particularly in relation to its parent node. (For example, +clarifying that a particular date child node is for the publication date, +rather than the last-modified date, with (published)date.)

+

Following the name are zero or more Arguments (Section 3.5) or +Properties (Section 3.4), separated by either whitespace (Section 3.17) or a +slash-escaped line continuation (Section 3.3). Arguments and Properties +may be interspersed in any order, much like is common with positional arguments +vs options in command line tools. Collectively, Arguments and Properties may be +referred to as "Entries".

+

Children (Section 3.6) can be placed after the name and the optional +Entries, possibly separated by either whitespace or a +slash-escaped line continuation.

+

Arguments are ordered relative to each other and that order must be preserved in +order to maintain the semantics. Properties between Arguments do not affect +Argument ordering.

+

By contrast, Properties SHOULD NOT be assumed to be presented in a given +order. Children (Section 3.6) should be used if an order-sensitive +key/value data structure must be represented in KDL. Cf. JSON objects +preserving key order.

+

Nodes MAY be prefixed with Slashdash (Section 3.17.3) to "comment out" +the entire node, including its properties, arguments, and children, and make +it act as plain whitespace, even if it spreads across multiple lines.

+

Finally, a node is terminated by either a Newline (Section 3.18), a semicolon +(;), the end of a child block (}) or the end of the file/stream (an EOF).

+
+
+

+3.2.1. Example +

+
+
+// `foo` will have an Argument value list like `[1, 3]`.
+foo 1 key=val 3 {
+    bar
+    (role)baz 1 2
+}
+
+
+
+
+
+
+
+
+

+3.3. Line Continuation +

+

Line continuations allow Nodes (Section 3.2) to be spread across multiple lines.

+

A line continuation is a \ character followed by zero or more whitespace +items (including multiline comments) and an optional single-line comment. It +must be terminated by a Newline (Section 3.18) (including the Newline that is +part of single-line comments).

+

Following a line continuation, processing of a Node can continue as usual.

+
+
+

+3.3.1. Example +

+
+
+my-node 1 2 \  // comments are ok after \
+        3 4    // This is the actual end of the Node.
+
+
+
+
+
+
+
+
+

+3.4. Property +

+

A Property is a key/value pair attached to a Node (Section 3.2). A Property is +composed of a String (Section 3.9), followed immediately by an equals sign (=, U+003D), +and then a Value (Section 3.7).

+

Properties should be interpreted left-to-right, with rightmost properties with +identical names overriding earlier properties. That is:

+
+
+node a=1 a=2
+
+
+

In this example, the node's a value must be 2, not 1.

+

No other guarantees about order should be expected by implementers. +Deserialized representations may iterate over properties in any order and +still be spec-compliant.

+

Properties MAY be prefixed with /- to "comment out" the entire token and +make it act as plain whitespace, even if it spreads across multiple lines.

+
+
+
+
+

+3.5. Argument +

+

An Argument is a bare Value (Section 3.7) attached to a Node (Section 3.2), with no +associated key. It shares the same space as Properties (Section 3.4), and may be interleaved with them.

+

A Node may have any number of Arguments, which should be evaluated left to +right. KDL implementations MUST preserve the order of Arguments relative to +each other (not counting Properties).

+

Arguments MAY be prefixed with /- to "comment out" the entire token and +make it act as plain whitespace, even if it spreads across multiple lines.

+
+
+

+3.5.1. Example +

+
+
+my-node 1 2 3 a b c
+
+
+
+
+
+
+
+
+

+3.6. Children Block +

+

A children block is a block of Nodes (Section 3.2), surrounded by { and }. They +are an optional part of nodes, and create a hierarchy of KDL nodes.

+

Regular node termination rules apply, which means multiple nodes can be +included in a single-line children block, as long as they're all terminated by +;.

+
+
+

+3.6.1. Example +

+
+
+parent {
+    child1
+    child2
+}
+
+parent { child1; child2; }
+
+
+
+
+
+
+
+
+

+3.7. Value +

+

A value is either: a String (Section 3.9), a Number (Section 3.14), a +Boolean (Section 3.15), or Null (Section 3.16).

+

Values MUST be either Arguments (Section 3.5) or values of +Properties (Section 3.4). Only String (Section 3.9) values may be used as +Node (Section 3.2) names or Property (Section 3.4) keys.

+

Values (both as arguments and in properties) MAY be prefixed by a single +Type Annotation (Section 3.8).

+
+
+
+
+

+3.8. Type Annotation +

+

A type annotation is a prefix to any Node Name (Section 3.2) or Value (Section 3.7) that +includes a suggestion of what type the value is intended to be treated as, +or as a context-specific elaboration of the more generic type the node name +indicates.

+

Type annotations are written as a set of ( and ) with a single +String (Section 3.9) in it. It may contain Whitespace after the ( and before +the ), and may be separated from its target by Whitespace.

+

KDL does not specify any restrictions on what implementations might do with +these annotations. They are free to ignore them, or use them to make decisions +about how to interpret a value.

+

Additionally, the following type annotations MAY be recognized by KDL parsers +and, if used, SHOULD interpret these types as follows:

+
+
+

+3.8.1. Reserved Type Annotations for Numbers Without Decimals: +

+

Signed integers of various sizes (the number is the bit size):

+
    +
  • +

    i8

    +
  • +
  • +

    i16

    +
  • +
  • +

    i32

    +
  • +
  • +

    i64

    +
  • +
  • +

    i128

    +
  • +
+

Unsigned integers of various sizes (the number is the bit size):

+
    +
  • +

    u8

    +
  • +
  • +

    u16

    +
  • +
  • +

    u32

    +
  • +
  • +

    u64

    +
  • +
  • +

    u128

    +
  • +
+

Platform-dependent integer types, both signed and unsigned:

+
    +
  • +

    isize

    +
  • +
  • +

    usize

    +
  • +
+
+
+
+
+

+3.8.2. Reserved Type Annotations for Numbers With Decimals: +

+

IEEE 754 floating point numbers, both single (32) and double (64) precision:

+
    +
  • +

    f32

    +
  • +
  • +

    f64

    +
  • +
+

IEEE 754-2008 decimal floating point numbers

+
    +
  • +

    decimal64

    +
  • +
  • +

    decimal128

    +
  • +
+
+
+
+
+

+3.8.3. Reserved Type Annotations for Strings: +

+
    +
  • +

    date-time: ISO8601 date/time format.

    +
  • +
  • +

    time: "Time" section of ISO8601.

    +
  • +
  • +

    date: "Date" section of ISO8601.

    +
  • +
  • +

    duration: ISO8601 duration format.

    +
  • +
  • +

    decimal: IEEE 754-2008 decimal string format.

    +
  • +
  • +

    currency: ISO 4217 currency code.

    +
  • +
  • +

    country-2: ISO 3166-1 alpha-2 country code.

    +
  • +
  • +

    country-3: ISO 3166-1 alpha-3 country code.

    +
  • +
  • +

    country-subdivision: ISO 3166-2 country subdivision code.

    +
  • +
  • +

    email: RFC5322 email address.

    +
  • +
  • +

    idn-email: RFC6531 internationalized email address.

    +
  • +
  • +

    hostname: RFC1123 internet hostname (only ASCII segments)

    +
  • +
  • +

    idn-hostname: RFC5890 internationalized internet hostname +(only xn---prefixed ASCII "punycode" segments, or non-ASCII segments)

    +
  • +
  • +

    ipv4: RFC2673 dotted-quad IPv4 address.

    +
  • +
  • +

    ipv6: RFC2373 IPv6 address.

    +
  • +
  • +

    url: RFC3986 URI.

    +
  • +
  • +

    url-reference: RFC3986 URI Reference.

    +
  • +
  • +

    irl: RFC3987 Internationalized Resource Identifier.

    +
  • +
  • +

    irl-reference: RFC3987 Internationalized Resource Identifier Reference.

    +
  • +
  • +

    url-template: RFC6570 URI Template.

    +
  • +
  • +

    uuid: RFC4122 UUID.

    +
  • +
  • +

    regex: Regular expression. Specific patterns may be implementation-dependent.

    +
  • +
  • +

    base64: A Base64-encoded string, denoting arbitrary binary data.

    +
  • +
+
+
+
+
+

+3.8.4. Examples +

+
+
+node (u8)123
+node prop=(regex).*
+(published)date "1970-01-01"
+(contributor)person name="Foo McBar"
+
+
+
+
+
+
+
+
+

+3.9. String +

+

Strings in KDL represent textual UTF-8 Values (Section 3.7). A String is either an +Identifier String (Section 3.10) (like foo), a +Quoted String (Section 3.11) (like "foo") +or a Multi-Line String (Section 3.12). +Both Quoted and Multiline strings come in normal +and Raw String (Section 3.13) variants (like #"foo"#):

+
    +
  • +

    Identifier Strings let you write short, "single-word" strings with a +minimum of syntax

    +
  • +
  • +

    Quoted Strings let you write strings "like normal", with whitespace and escapes.

    +
  • +
  • +

    Multi-Line Strings let you write strings across multiple lines + and with indentation that's not part of the string value.

    +
  • +
  • +

    Raw Strings don't allow any escapes, +allowing you to not worry about the string's content containing anything that +might look like an escape.

    +
  • +
+

Strings MUST be represented as UTF-8 values.

+

Strings MUST NOT include the code points for +disallowed literal code points (Section 3.19) directly. +Quoted and Multi-Line Strings may include these code points as values +by representing them with their corresponding \u{...} escape.

+
+
+
+
+

+3.10. Identifier String +

+

An Identifier String (sometimes referred to as just an "identifier") is +composed of any Unicode Scalar +Value other than +non-initial characters (Section 3.10.1), followed by any number of +Unicode Scalar Values other than non-identifier +characters (Section 3.10.2).

+

A handful of patterns are disallowed, to avoid confusion with other values:

+
    +
  • +

    idents that appear to start with a Number (Section 3.14) (like 1.0v2 or + -1em) or the "almost a number" pattern of a decimal point without a + leading digit (like .1).

    +
  • +
  • +

    idents that are the language keywords (inf, -inf, nan, true, +false, and null) without their leading #.

    +
  • +
+

Identifiers that match these patterns MUST be treated as a syntax error; such +values can only be written as quoted or raw strings. The precise details of the +identifier syntax is specified in the Full Grammar in Section 4.

+
+
+

+3.10.1. Non-initial characters +

+

The following characters cannot be the first character in an +Identifier String (Section 3.10):

+
    +
  • +

    Any decimal digit (0-9)

    +
  • +
  • +

    Any non-identifier characters (Section 3.10.2)

    +
  • +
+

Additionally, the following initial characters impose limitations on subsequent +characters:

+
    +
  • +

    the + and - characters can only be used as an initial character if +the second character is not a digit. If the second character is ., then +the third character must not be a digit.

    +
  • +
  • +

    the . character can only be used as an initial character if +the second character is not a digit.

    +
  • +
+

This allows identifiers to look like --this or .md, and removes the +ambiguity of having an identifier look like a number.

+
+
+
+
+

+3.10.2. Non-identifier characters +

+

The following characters cannot be used anywhere in a Identifier String (Section 3.10):

+ +
+
+
+
+
+
+

+3.11. Quoted String +

+

A Quoted String is delimited by " on either side of any number of literal +string characters except unescaped " and \.

+

Literal Newline (Section 3.18) characters can only be included +if they are Escaped Whitespace (Section 3.11.1.1), +which discards them from the string value. +Actually including a newline in the value requires using a newline escape sequence, +like \n, +or using a Multi-Line String (Section 3.12) +which is actually designed for strings stretching across multiple lines.

+

Like Identifier Strings, Quoted Strings MUST NOT include any of the +disallowed literal code-points (Section 3.19) as code +points in their body.

+

Quoted Strings have a Raw String (Section 3.13) variant, +which disallows escapes.

+
+
+

+3.11.1. Escapes +

+

In addition to literal code points, a number of "escapes" are supported in Quoted Strings. +"Escapes" are the character \ followed by another character, and are +interpreted as described in the following table:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1
NameEscapeCode Pt
Line Feed + \n + + U+000A +
Carriage Return + \r + + U+000D +
Character Tabulation (Tab) + \t + + U+0009 +
Reverse Solidus (Backslash) + \\ + + U+005C +
Quotation Mark (Double Quote) + \" + + U+0022 +
Backspace + \b + + U+0008 +
Form Feed + \f + + U+000C +
Space + \s + + U+0020 +
Unicode Escape + \u{(1-6 hex chars)} +Code point described by hex characters, as long as it represents a Unicode Scalar Value +
Whitespace EscapeSee belowN/A
+
+
+
+3.11.1.1. Escaped Whitespace +
+

In addition to escaping individual characters, \ can also escape whitespace. +When a \ is followed by one or more literal whitespace characters, the \ +and all of that whitespace are discarded. For example,

+
+
+"Hello World"
+
+
+

and

+
+
+"Hello \    World"
+
+
+

are semantically identical. See whitespace (Section 3.17) +and newlines (Section 3.18) for how whitespace is defined.

+

Note that only literal whitespace is escaped; whitespace escapes (\n and +such) are retained. For example, these strings are all semantically identical:

+
+
+"Hello\       \nWorld"
+
+    "Hello\n\
+    World"
+
+"Hello\nWorld"
+
+"""
+  Hello
+  World
+  """
+
+
+
+
+
+
+
+3.11.1.2. Invalid escapes +
+

Except as described in the escapes table, above, \ MUST NOT precede any +other characters in a string.

+
+
+
+
+
+
+
+
+

+3.12. Multi-line String +

+

Multi-Line Strings support multiple lines with literal, non-escaped +Newlines. They must use a special multi-line syntax, and they automatically +"dedent" the string, allowing its value to be indented to a visually matching +level as desired.

+

A Multi-Line String is opened and closed by three double-quote characters, +like """. +Its first line MUST immediately start with a Newline (Section 3.18) +after its opening """. +Its final line MUST contain only whitespace +before the closing """. +All in-between lines that contain non-newline, non-whitespace characters +MUST start with at least the exact same whitespace as the final line +(precisely matching codepoints, not merely counting characters or "size"); +they may contain additional whitespace following this prefix. The lines in +between may contain unescaped " (but no unescaped """ as this would close +the string).

+

The value of the Multi-Line String omits the first and last Newline, the +Whitespace of the last line, and the matching Whitespace prefix on all +intermediate lines. The first and last Newline can be the same character (that +is, empty multi-line strings are legal).

+

In other words, the final line specifies the whitespace prefix that will be +removed from all other lines.

+

Multi-line Strings that do not immediately start with a Newline and whose final +""" is not preceeded by optional whitespace and a Newline are illegal. This +also means that """ may not be used for a single-line String (e.g. +"""foo""").

+
+
+

+3.12.1. Newline Normalization +

+

Literal Newline sequences in Multi-line Strings must be normalized to a single +U+000A (LF) during deserialization. This means, for example, that CR LF +becomes a single LF during parsing.

+

This normalization does not apply to non-literal Newlines entered using escape +sequences. That is:

+
+
+multi-line """
+    \r\n[CRLF]
+    foo[CRLF]
+    """
+
+
+

becomes:

+
+
+single-line "\r\n\nfoo"
+
+
+

For clarity: this normalization applies to each individual Newline sequence. +That is, the literal sequence CRLF CRLF becomes LF LF, not LF.

+
+
+
+
+

+3.12.2. Examples +

+
+
+
+3.12.2.1. Indented multi-line string +
+
+
+multi-line """
+        foo
+    This is the base indentation
+            bar
+    """
+
+
+

This example's string value will be:

+
+
+    foo
+This is the base indentation
+        bar
+
+
+

which is equivalent to

+
+
+"    foo\nThis is the base indentation\n        bar"
+
+
+

when written as a single-line string.

+
+
+
+
+
+3.12.2.2. Shorter last-line indent +
+

If the last line wasn't indented as far, +it won't dedent the rest of the lines as much:

+
+
+multi-line """
+        foo
+    This is no longer on the left edge
+            bar
+  """
+
+
+

This example's string value will be:

+
+
+      foo
+  This is no longer on the left edge
+          bar
+
+
+

Equivalent to

+
+
+"      foo\n  This is no longer on the left edge\n          bar"
+
+
+
+
+
+
+
+3.12.2.3. Empty lines +
+

Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value:

+
+
+multi-line """
+    Indented a bit
+
+    A second indented paragraph.
+    """
+
+
+

This example's string value will be:

+
+
+Indented a bit.
+
+A second indented paragraph.
+
+
+

Equivalent to

+
+
+"Indented a bit.\n\nA second indented paragraph."
+
+
+
+
+
+
+
+3.12.2.4. Syntax errors +
+

The following yield syntax errors:

+
+
+multi-line """can't be single line"""
+
+
+
+
+multi-line """
+  closing quote with non-whitespace prefix"""
+
+
+
+
+multi-line """stuff
+  """
+
+
+
+
+// Every line must share the exact same prefix as the closing line.
+multi-line """[\n]
+[tab]a[\n]
+[space][space]b[\n]
+[space][tab][\n]
+[tab]"""
+
+
+
+
+
+
+
+
+

+3.12.3. Interaction with Whitespace Escapes +

+

Multi-line strings support the same mechanism for escaping whitespace as Quoted +Strings.

+

When processing a Multi-line String, implementations MUST dedent the string +after resolving all whitespace escapes, but before resolving other backslash +escapes. This means a whitespace escape that attempts to escape the final line's +newline and/or whitespace prefix can be invalid: if removing escaped whitespace +places the closing """ on a line with non-whitespace characters, this escape +is invalid.

+

For example, the following example is illegal:

+
+
+  """
+  foo
+  bar\
+  """
+
+  // equivalent to
+  """
+  foo
+  bar"""
+
+
+

while the following example is allowed

+
+
+  """
+  foo \
+bar
+  baz
+  \   """
+
+  // equivalent to
+  """
+  foo bar
+  baz
+  """
+
+
+
+
+
+
+
+
+

+3.13. Raw String +

+

Both Quoted (Section 3.11) and Multi-Line Strings (Section 3.12) have +Raw String variants, which are identical in syntax except they do not support +\-escapes. This includes line-continuation escapes (\ + ws collapsing to +nothing). They otherwise share the same properties as far as literal +Newline (Section 3.18) characters go, multi-line rules, and the requirement of +UTF-8 representation.

+

The Raw String variants are indicated by preceding the strings's opening quotes +with one or more # characters. The string is then closed by its normal closing +quotes, followed by a matching number of # characters. This means that the +string may contain any combination of " and # characters other than its +closing delimiter (e.g., if a raw string starts with ##", it can contain " +or "#, but not "## or "###).

+

Like other Strings, Raw Strings MUST NOT include any of the disallowed +literal code-points (Section 3.19) as code points in their +body. Unlike with Quoted Strings, these cannot simply be escaped, and are thus +unrepresentable when using Raw Strings.

+
+
+

+3.13.1. Example +

+
+
+just-escapes #"\n will be literal"#
+
+
+

The string contains the literal characters \n will be literal.

+
+
+quotes-and-escapes ##"hello\n\r\asd"#world"##
+
+
+

The string contains the literal characters hello\n\r\asd"#world

+
+
+raw-multi-line #"""
+    Here's a """
+        multiline string
+        """
+    without escapes.
+    """#
+
+
+

The string contains the value

+
+
+Here's a """
+    multiline string
+    """
+without escapes.
+
+
+

or equivalently,

+
+
+"Here's a \"\"\"\n    multiline string\n    \"\"\"\nwithout escapes."
+
+
+

as a Quoted String.

+
+
+
+
+
+
+

+3.14. Number +

+

Numbers in KDL represent numerical Values (Section 3.7). There is no logical distinction in KDL +between real numbers, integers, and floating point numbers. It's up to +individual implementations to determine how to represent KDL numbers.

+

There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary.

+
    +
  • +

    All non-Keyword (Section 3.14.1) numbers may optionally start with one of - or +, which determine whether they'll be positive or negative.

    +
  • +
  • +

    Binary numbers start with 0b and only allow 0 and 1 as digits, which may be separated by _. They represent numbers in radix 2.

    +
  • +
  • +

    Octal numbers start with 0o and only allow digits between 0 and 7, which may be separated by _. They represent numbers in radix 8.

    +
  • +
  • +

    Hexadecimal numbers start with 0x and allow digits between 0 and 9, as well as letters A through F, in either lower or upper case, which may be separated by _. They represent numbers in radix 16.

    +
  • +
  • +

    Decimal numbers are a bit more special:

    +
      +
    • +

      They have no radix prefix.

      +
    • +
    • +

      They use digits 0 through 9, which may be separated by _.

      +
    • +
    • +

      They may optionally include a decimal separator ., followed by more digits, which may again be separated by _.

      +
    • +
    • +

      They may optionally be followed by E or e, an optional - or +, and more digits, to represent an exponent value.

      +
    • +
    +
  • +
+

Note that, similar to JSON and some other languages, +numbers without an integer digit (such as .1) are illegal. +They must be written with at least one integer digit, like 0.1. +(These patterns are also disallowed from Identifier Strings (Section 3.10), to avoid confusion.)

+
+
+

+3.14.1. Keyword Numbers +

+

There are three special "keyword" numbers included in KDL to accomodate the +widespread use of IEEE 754 floats:

+
    +
  • +

    #inf - floating point positive infinity.

    +
  • +
  • +

    #-inf - floating point negative infinity.

    +
  • +
  • +

    #nan - floating point NaN/Not a Number.

    +
  • +
+

To go along with this and prevent foot guns, the bare Identifier +Strings (Section 3.10) inf, -inf, and nan are considered illegal +identifiers and should yield a syntax error.

+

The existence of these keywords does not imply that any numbers be represented +as IEEE 754 floats. These are simply for clarity and convenience for any +implementation that chooses to represent their numbers in this way.

+
+
+
+
+
+
+

+3.15. Boolean +

+

A boolean Value (Section 3.7) is either the symbol #true or #false. These +SHOULD be represented by implementation as boolean logical values, or some +approximation thereof.

+
+
+

+3.15.1. Example +

+
+
+my-node #true value=#false
+
+
+
+
+
+
+
+
+

+3.16. Null +

+

The symbol #null represents a null Value (Section 3.7). It's up to the +implementation to decide how to represent this, but it generally signals the +"absence" of a value.

+
+
+

+3.16.1. Example +

+
+
+my-node #null key=#null
+
+
+
+
+
+
+
+
+

+3.17. Whitespace +

+

The following characters should be treated as non-Newline (Section 3.18) white +space:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 2
NameCode Pt
Character Tabulation + U+0009 +
Space + U+0020 +
No-Break Space + U+00A0 +
Ogham Space Mark + U+1680 +
En Quad + U+2000 +
Em Quad + U+2001 +
En Space + U+2002 +
Em Space + U+2003 +
Three-Per-Em Space + U+2004 +
Four-Per-Em Space + U+2005 +
Six-Per-Em Space + U+2006 +
Figure Space + U+2007 +
Punctuation Space + U+2008 +
Thin Space + U+2009 +
Hair Space + U+200A +
Narrow No-Break Space + U+202F +
Medium Mathematical Space + U+205F +
Ideographic Space + U+3000 +
+
+
+

+3.17.1. Single-line comments +

+

Any text after //, until the next literal Newline (Section 3.18) is "commented +out", and is considered to be Whitespace (Section 3.17).

+
+
+
+
+

+3.17.2. Multi-line comments +

+

In addition to single-line comments using //, comments can also be started +with /* and ended with */. These comments can span multiple lines. They +are allowed in all positions where Whitespace (Section 3.17) is allowed and +can be nested.

+
+
+
+
+

+3.17.3. Slashdash comments +

+

Finally, a special kind of comment called a "slashdash", denoted by /-, can +be used to comment out entire components of a KDL document logically, and +have those elements not be included as part of the parsed document data.

+

Slashdash comments can be used before the following, including before their type +annotations, if present:

+
    +
  • +

    A Node (Section 3.2): the entire Node is treated as Whitespace, including all +props, args, and children.

    +
  • +
  • +

    An Argument (Section 3.5): the Argument value is treated as Whitespace.

    +
  • +
  • +

    A Property (Section 3.4) key: the entire property, including both key and value, +is treated as Whitespace. A slashdash of just the property value is not allowed.

    +
  • +
  • +

    A Children Block (Section 3.6): the entire block, including all +children within, is treated as Whitespace. Only other children blocks, whether +slashdashed or not, may follow a slashdashed children block.

    +
  • +
+

A slashdash may be be followed by any amount of whitespace, including newlines and +comments (other than other slashdashes), before the element that it comments out.

+
+
+
+
+
+
+

+3.18. Newline +

+

The following character sequences should be treated as new +lines:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 3
AcronymNameCode Pt
CRLFCarriage Return and Line Feed + U+000D + U+000A +
CRCarriage Return + U+000D +
LFLine Feed + U+000A +
NELNext Line + U+0085 +
VTVertical tab + U+000B +
FFForm Feed + U+000C +
LSLine Separator + U+2028 +
PSParagraph Separator + U+2029 +
+

Note that for the purpose of new lines, the specific sequence CRLF is +considered a single newline.

+
+
+
+
+

+3.19. Disallowed Literal Code Points +

+

The following code points may not appear literally anywhere in the document. +They may be represented in Strings (but not Raw Strings) using Unicode Escapes (Section 3.11.1) (\u{...}, +except for non Unicode Scalar Value, which can't be represented even as escapes).

+
    +
  • +

    The codepoints U+0000-0008 or the codepoints U+000E-001F (various +control characters).

    +
  • +
  • +

    U+007F (the Delete control character).

    +
  • +
  • +

    Any codepoint that is not a Unicode Scalar +Value (U+D800-DFFF).

    +
  • +
  • +

    U+200E-200F, U+202A-202E, and U+2066-2069, the unicode +"direction control" +characters

    +
  • +
  • +

    U+FEFF, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM), +except as the first code point in a document.

    +
  • +
+
+
+
+
+
+
+

+4. Full Grammar +

+

This is the full official grammar for KDL and should be considered +authoritative if something seems to disagree with the text above. The grammar +language syntax is defined in Section 4.1.

+
+
+document := bom? version? nodes
+
+// Nodes
+nodes := (line-space* node)* line-space*
+
+base-node := slashdash? type? node-space* string
+    (node-space+ slashdash? node-prop-or-arg)*
+    // slashdashed node-children must always be after props and args.
+    (node-space+ slashdash node-children)*
+    (node-space+ node-children)?
+    (node-space+ slashdash node-children)*
+    node-space*
+node := base-node node-terminator
+final-node := base-node node-terminator?
+
+// Entries
+node-prop-or-arg := prop | value
+node-children := '{' nodes final-node? '}'
+node-terminator := single-line-comment | newline | ';' | eof
+
+prop := string node-space* '=' node-space* value
+value := type? node-space* (string | number | keyword)
+type := '(' node-space* string node-space* ')'
+
+// Strings
+string := identifier-string | quoted-string | raw-string ¶
+
+identifier-string := unambiguous-ident | signed-ident | dotted-ident
+unambiguous-ident :=
+    ((identifier-char - digit - sign - '.') identifier-char*)
+    - disallowed-keyword-strings
+signed-ident :=
+    sign ((identifier-char - digit - '.') identifier-char*)?
+dotted-ident :=
+    sign? '.' ((identifier-char - digit) identifier-char*)?
+identifier-char :=
+    unicode - unicode-space - newline - [\\/(){};\[\]"#=]
+    - disallowed-literal-code-points
+disallowed-keyword-identifiers :=
+    'true' | 'false' | 'null' | 'inf' | '-inf' | 'nan'
+
+quoted-string :=
+    '"' single-line-string-body '"' |
+    '"""' newline
+    (multi-line-string-body newline)?
+    (unicode-space | ws-escape)* '"""'
+single-line-string-body := (string-character - newline)*
+multi-line-string-body := (('"' | '""')? string-character)*
+string-character :=
+    '\\' (["\\bfnrts] |
+    'u{' hex-unicode '}') |
+    ws-escape |
+    [^\\"] - disallowed-literal-code-points
+ws-escape := '\\' (unicode-space | newline)+
+hex-digit := [0-9a-fA-F]
+hex-unicode := hex-digit{1, 6} - surrogates
+surrogates := [dD][8-9a-fA-F]hex-digit{2}
+// U+D800-DFFF: D  8         00
+//              D  F         FF
+
+raw-string := '#' raw-string-quotes '#' | '#' raw-string '#'
+raw-string-quotes :=
+    '"' single-line-raw-string-body '"' |
+    '"""' newline
+    (multi-line-raw-string-body newline)?
+    unicode-space* '"""'
+single-line-raw-string-body :=
+    '' |
+    (single-line-raw-string-char - '"')
+        single-line-raw-string-char*? |
+    '"' (single-line-raw-string-char - '"')
+        single-line-raw-string-char*?
+single-line-raw-string-char :=
+    unicode - newline - disallowed-literal-code-points
+multi-line-raw-string-body :=
+    (unicode - disallowed-literal-code-points)*?
+
+// Numbers
+number := keyword-number | hex | octal | binary | decimal
+
+decimal := sign? integer ('.' integer)? exponent?
+exponent := ('e' | 'E') sign? integer
+integer := digit (digit | '_')*
+digit := [0-9]
+sign := '+' | '-'
+
+hex := sign? '0x' hex-digit (hex-digit | '_')*
+octal := sign? '0o' [0-7] [0-7_]*
+binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')*
+
+// Keywords and booleans.
+keyword := boolean | '#null'
+keyword-number := '#inf' | '#-inf' | '#nan'
+boolean := '#true' | '#false'
+
+// Specific code points
+bom := '\u{FEFF}'
+disallowed-literal-code-points :=
+    See Table (Disallowed Literal Code Points)
+unicode := Any Unicode Scalar Value
+unicode-space := See Table
+    (All White_Space unicode characters which are not `newline`)
+
+// Comments
+single-line-comment := '//' ^newline* (newline | eof)
+multi-line-comment := '/*' commented-block
+commented-block :=
+    '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
+slashdash := '/-' line-space*
+
+// Whitespace
+ws := unicode-space | multi-line-comment
+escline := '\\' ws* (single-line-comment | newline | eof)
+newline := See Table (All Newline White_Space)
+// Whitespace where newlines are allowed.
+line-space := node-space | newline | single-line-comment
+// Whitespace within nodes,
+// where newline-ish things must be esclined.
+node-space := ws* escline ws* | ws+
+
+// Version marker
+version :=
+    '/-' unicode-space* 'kdl-version' unicode-space+ ('1' | '2')
+    unicode-space* newline
+
+
+
+
+

+4.1. Grammar language +

+

The grammar language syntax is a combination of ABNF with some regex spice thrown in. +Specifically:

+
    +
  • +

    Single quotes (') are used to denote literal text. \ within a literal +string is used for escaping other single-quotes, for initiating unicode +characters using hex values (\u{FEFF}), and for escaping \ itself +(\\).

    +
  • +
  • +

    * is used for "zero or more", + is used for "one or more", and ? is +used for "zero or one". Per standard regex semantics, * and + are greedy; +they match as many instances as possible without failing the match.

    +
  • +
  • +

    *? (used only in raw strings) indicates a non-greedy match; +it matches as few instances as possible without failing the match.

    +
  • +
  • +

    is a cut point. It always matches and consumes no characters, +but once matched, the parser is not allowed to backtrack past that point in the source. +If a parser would rewind past the cut point, it must instead fail the overall parse, +as if it had run out of options. +(This is only used with the raw-string production, +to ensure the first instance of the appropriate closing quote sequence +is guaranteed to be the end of the raw string, +rather than allowing it to potentially consume more of the document unexpectedly.)

    +
  • +
  • +

    () can be used to group matches that must be matched together.

    +
  • +
  • +

    a | b means a or b, whichever matches first. If multiple items are before +a |, they are a single group. a b c | d is equivalent to (a b c) | d.

    +
  • +
  • +

    [] are used for regex-style character matches, where any character between +the brackets will be a single match. \ is used to escape \, [, and +]. They also support character ranges (0-9), and negation (^)

    +
  • +
  • +

    - is used for "except for" or "minus" whatever follows it. For example, +a - 'x' means "any a, except something that matches the literal 'x'".

    +
  • +
  • +

    The prefix ^ means "something that does not match" whatever follows it. +For example, ^foo means "must not match foo".

    +
  • +
  • +

    A single definition may be split over multiple lines. Newlines are treated as +spaces.

    +
  • +
  • +

    // followed by text on its own line is used as comment syntax.

    +
  • +
+
+
+
+
+
+
+

+Authors' Addresses +

+
+
Katerina Zoé Marchán Salvá
+
Microsoft
+
+
+
The KDL Contributors
+
+
+
+ + +