You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Onigmo applies https://www.unicode.org/Public/UNIDATA/SpecialCasing.txt both ways in case insensitive mode, but only for literals, ranges, positive unicode properties outside of charsets, positive or negated properties inside charsets, and maybe some other exotic cases.
This applies even if the mapping is between one and multiple chars, and even if the single char is part of a charset (e.g. /[ß]/i), allowing some unquantified charsets to match more than one char.
E.g. the literal ß maps to ["ss", "sS", "sſ", "Ss", "SS", "Sſ", "ſs", "ſS", "ſſ", "ẞ"], and vice versa, even if in a charset (/[ß]/), but only if present as literal, part of a positive range, part of a positive property, or part of a negated property in a charset (i.e. not when part of a char type, a positive posix class, a negated charset containing literals ranges or char types, a plain negated property, a negated property in a negated charset, or a negated posix class.).
Not yet tested: codepoint lists, meta & control escapes, absence groups, effects of backrefs or subexp calls, nested charsets, combining various expressions in charsets, free-spacing mode, ...
Per your description, this case-insensitive mapping from one to multiple chars seems like a mess. I ran just your first test in Oniguruma, and it's doing the same thing as Onigmo.
Note that JS doesn't do this for Unicode case folding (which it applies when flag i is combined with u or v). So a good starting point might be to not yet worry about expansions to multiple chars, but still support other aspects of Unicode case folding (which js_regex doesn't yet do) like titlecase chars and Turkish İı.
Not sure what the recommendation is for case-insensitive length expansions in regular expressions per UTS 18 (i.e., not sure whether JS or Onigmo more closely follows the Unicode recommendations).
Issue extracted from #21.
Onigmo applies https://www.unicode.org/Public/UNIDATA/SpecialCasing.txt both ways in case insensitive mode, but only for literals, ranges, positive unicode properties outside of charsets, positive or negated properties inside charsets, and maybe some other exotic cases.
This applies even if the mapping is between one and multiple chars, and even if the single char is part of a charset (e.g.
/[ß]/i
), allowing some unquantified charsets to match more than one char.E.g. the literal
ß
maps to["ss", "sS", "sſ", "Ss", "SS", "Sſ", "ſs", "ſS", "ſſ", "ẞ"]
, and vice versa, even if in a charset (/[ß]/
), but only if present as literal, part of a positive range, part of a positive property, or part of a negated property in a charset (i.e. not when part of a char type, a positive posix class, a negated charset containing literals ranges or char types, a plain negated property, a negated property in a negated charset, or a negated posix class.).Not yet tested: codepoint lists, meta & control escapes, absence groups, effects of backrefs or subexp calls, nested charsets, combining various expressions in charsets, free-spacing mode, ...
The text was updated successfully, but these errors were encountered: