Some case-swapping edge cases are handled incorrectly

Issue extracted from #21.

Onigmo applies https://www.unicode.org/Public/UNIDATA/SpecialCasing.txt both ways in case insensitive mode, but only for literals, ranges, positive unicode properties outside of charsets, positive or negated properties inside charsets, and maybe some other exotic cases.

This applies even if the mapping is between one and multiple chars, and even if the single char is part of a charset (e.g. `/[ß]/i`), allowing some unquantified charsets to match more than one char.

E.g. the literal `ß` maps to  `["ss", "sS", "sſ", "Ss", "SS", "Sſ", "ſs", "ſS", "ſſ", "ẞ"]`, and vice versa, even if in a charset (`/[ß]/`), but only if present as literal, part of a positive range, part of a positive property, or part of a negated property in a charset (i.e. not when part of a char type, a positive posix class, a negated charset containing literals ranges or char types, a plain negated property, a negated property in a negated charset, or a negated posix class.).

Not yet tested: codepoint lists, meta & control escapes, absence groups, effects of backrefs or subexp calls, nested charsets, combining various expressions in charsets, free-spacing mode, ...

```ruby
RUBY_VERSION # => "3.3.6"

# literal matching cases
'SS'[/ß/i] # => "SS"
'ASSE'[/ß/i] # => "SS"
'ASSE'[/AßE/i] # => "ASSE"
'ASSE'[/A[ß]E/i] # => "ASSE" # (!)

# non-literal matching cases
'SS'[/\u00DF/i] # => "SS"
'SS'[/[\u00DE-\u00E0]/i] # => "SS"
'SS'[/[Þ-à]/i] # => "SS"
'SS'[/[Þ-à&&Þ-á]/i] # => "SS"
'SS'[/[T-\u{10FFFF}]/i] # => "S" # because s -> S ?
'SS'[/[t-\u{10FFFF}]/i] # => "S" # because ſ -> S ?
'SS'[/\p{word}/i] # => 'S'
'ASSE'[/A\p{word}E/i] # => 'ASSE'
'ASSE'[/A[\p{word}]E/i] # => 'ASSE'
'ASSE'[/A[\p{^Mark}]E/i] # => 'ASSE'

# non-matching cases
'ASSE'[/A.E/i] # => nil
'ASSE'[/A[.]E/i] # => nil
'ASSE'[/A[^x]E/i] # => nil
'ASSE'[/A[^x-y]E/i] # => nil
'ASSE'[/A[^\d]E/i] # => nil
'ASSE'[/A(?u:\w)E/i] # => nil
'ASSE'[/A\p{^Mark}E/i] # => nil
'ASSE'[/A[[:word:]]E/i] # => nil
'ASSE'[/A[[:^digit:]]E/i] # => nil
'ASSE'[/A[\p{^Mark}]E/i] # => nil
'ASSE'[/A[^\p{^word}]E/i] # => nil
'ASSE'[/A[ß]{2}E/i] # => nil

# inverse direction
'ß'[/SS/i] # => 'ß'
'ß'[/ss/i] # => 'ß'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some case-swapping edge cases are handled incorrectly #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some case-swapping edge cases are handled incorrectly #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions