Open
Description
Brought to attention by #911
Column names can contain any symbol. This is important to support reading and writing any format.
Accessors, however, don't support all symbols due to limitations of the JVM.
Identifiers need to follow the spec:
(Letter | '_') {Letter | '_' | UnicodeDigit}
is allowed without`
- Letter: any unicode character of categories Lu, Ll, Lt, Lm or Lo
- UnicodeDigit: any unicode character of category Nd
'`' QuotedSymbol {QuotedSymbol} '`'
- any character excluding CR, LF and
'`'
(well except the last part, we cannot write`
inside a name with backticks
- any character excluding CR, LF and
., ;, [, ], /, <, >, :, \\
are never allowed
Source: https://kotlinlang.org/spec/syntax-and-grammar.html#identifiers
To support QuotedSymbol
characters, our generator automatically inserts backticks where needed.
For disallowed characters, we use the following conversion:
This conversion makes it so that columns from data will be accessible like:
- "my::colName" ->
df.`my - colName`
- "Dwayne `The Rock` Johnson" ->
df.`Dwayne 'The Rock' Johnson`
- "name.first" ->
df.`name first`
These conversions are defined to cause as little clashes as possible, but there are some confusing choices.
For instance, "." becoming " ", instead of "_".
This needs some research and feedback.