@@ -506,7 +506,7 @@ mod gödel {
506506would be mangled as:
507507
508508```
509- _RNvNtNtC7mycrateu8gdel_Fqa6escher4bach
509+ _RNvNtNtC7mycrateu8gdel_5qa6escher4bach
510510 <-------->
511511 Unicode component
512512```
@@ -613,10 +613,10 @@ compiler generates mangled names.
613613
614614The syntax of mangled names is given in extended Backus-Naur form:
615615
616- - Non-terminals are within angle brackets (as in ` <name-prefix > ` )
616+ - Non-terminals are within angle brackets (as in ` <path > ` )
617617 - Terminals are within quotes (as in ` "_R" ` ),
618- - Optional parts are in brackets (as in ` [<decimal >] ` ),
619- - Repetition (zero or more times) is signified by curly braces (as in ` {<name-prefix >} ` )
618+ - Optional parts are in brackets (as in ` [<disambiguator >] ` ),
619+ - Repetition (zero or more times) is signified by curly braces (as in ` {<type >} ` )
620620 - Comments are marked with ` // ` .
621621
622622Mangled names conform to the following grammar:
@@ -641,11 +641,13 @@ Mangled names conform to the following grammar:
641641<impl-path> = [<disambiguator>] <path>
642642
643643// The <decimal-number> is the length of the identifier in bytes.
644- // <bytes> is the identifier itself and must not start with a decimal digit.
644+ // <bytes> is the identifier itself, and it's optionally preceded by "_",
645+ // to separate it from its length - this "_" is mandatory if the <bytes>
646+ // starts with a decimal digit, or "_", in order to keep it unambiguous.
645647// If the "u" is present then <bytes> is Punycode-encoded.
646648<identifier> = [<disambiguator>] <undisambiguated-identifier>
647649<disambiguator> = "s" <base-62-number>
648- <undisambiguated-identifier> = ["u"] <decimal-number> <bytes>
650+ <undisambiguated-identifier> = ["u"] <decimal-number> ["_"] <bytes>
649651
650652// Namespace of the identifier in a (nested) path.
651653// It's an a-zA-Z character, with a-z reserved for implementation-internal
@@ -775,29 +777,22 @@ and, for now, only define a mangling for integer values.
775777### Punycode Identifiers
776778
777779Punycode generates strings of the form ` ([[:ascii:]]+-)?[[:alnum:]]+ ` .
778- This is problematic for two reasons:
780+ This is problematic because of the ` - ` character, which is not in the
781+ supported character set; Punycode uses it to separate the ASCII part
782+ (if it exists), from the base-36 encoding of the non-ASCII characters.
779783
780- - Generated strings can contain a ` - ` character; which is not in the
781- supported character set.
782- - Generated strings can start with a digit; which makes them clash
783- with the byte-count prefix of the ` <identifier> ` production.
784-
785- For these reasons, vanilla Punycode string are further encoded during mangling:
786-
787- - The ` - ` character is simply replaced by a ` _ ` character.
788- - The part of the Punycode string that encodes the non-ASCII characters
789- is a base-36 number, using ` [a-z0-9] ` as its "digits". We want to get
790- rid of the decimal digits in there, so we simply remap ` 0-9 ` to ` A-J ` .
784+ For this reasons, we deviate from vanilla Punycode, by replacing
785+ the ` - ` character with a ` _ ` character.
791786
792787Here are some examples:
793788
794789| Original | Punycode | Punycode + Encoding |
795790| -----------------| -----------------| ---------------------|
796- | føø | f-5gaa | f_Fgaa |
797- | α_ω | _ -ylb7e | __ ylbHe |
798- | 铁锈 | n84amf | nIEamf |
799- | 🤦 | fq9h | fqJh |
800- | ρυστ | 2xaedc | Cxaedc |
791+ | føø | f-5gaa | f_5gaa |
792+ | α_ω | _ -ylb7e | __ ylb7e |
793+ | 铁锈 | n84amf | n84amf |
794+ | 🤦 | fq9h | fq9h |
795+ | ρυστ | 2xaedc | 2xaedc |
801796
802797With this post-processing in place the Punycode strings can be treated
803798like regular identifiers and need no further special handling.
@@ -1154,3 +1149,4 @@ pub static QUUX: u32 = {
11541149- Resolve question of complex constant data.
11551150- Add a recommended resolution for open question around Punycode identifiers.
11561151- Add a recommended resolution for open question around encoding function parameter types.
1152+ - Allow identifiers to start with a digit.
0 commit comments