Skip to content

Revisit accessor name generation #922

Open
@Jolanrensen

Description

@Jolanrensen

Brought to attention by #911

Column names can contain any symbol. This is important to support reading and writing any format.
Accessors, however, don't support all symbols due to limitations of the JVM.

Identifiers need to follow the spec:

  • (Letter | '_') {Letter | '_' | UnicodeDigit} is allowed without `
    • Letter: any unicode character of categories Lu, Ll, Lt, Lm or Lo
    • UnicodeDigit: any unicode character of category Nd
  • '`' QuotedSymbol {QuotedSymbol} '`'
    • any character excluding CR, LF and '`' (well except the last part, we cannot write ` inside a name with backticks
  • ., ;, [, ], /, <, >, :, \\ are never allowed

Source: https://kotlinlang.org/spec/syntax-and-grammar.html#identifiers

To support QuotedSymbol characters, our generator automatically inserts backticks where needed.
For disallowed characters, we use the following conversion:

image

This conversion makes it so that columns from data will be accessible like:

  • "my::colName" -> df.`my - colName`
  • "Dwayne `The Rock` Johnson" -> df.`Dwayne 'The Rock' Johnson`
  • "name.first" -> df.`name first`

These conversions are defined to cause as little clashes as possible, but there are some confusing choices.
For instance, "." becoming " ", instead of "_".

This needs some research and feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchThis requires a deeper dive to gather a better understanding

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions