Regexp: support for case-insensitive Unicode matching#2130
Regexp: support for case-insensitive Unicode matching#2130balajirrao wants to merge 10 commits intomozilla:masterfrom
Conversation
6a24f28 to
647c882
Compare
|
@balajirrao any plans for finishing this? Waiting for that to makt the separate engine pr... |
|
@rbri I thought I was going to finish it and then I hit a wall. I believe that in order to do this in the general case, we'd need icu4j. I'm considering creating a module outside of rhino, say, rhino-icu4j that when included would offer complete Unicode support in regexps and possibly in other cases too. How does that sound ? |
IMHO that's the right approach. An opt-in module that, if present, adds the capability. If not, we can error out with "not supported". It would be a good improvement on what we do now. |
|
I'm not sure the complement classes present an insurmountable wall. icu4j would certainly offer a route to a complete implementation, but it would also be entirely reasonable to calculate classes, and their complements, when needed. Looping from 0 to MAX_CODE_POINT and building a range structure doesn't actually take much time, and most unicode classes have ranges that can be represented pretty compactly. |
fa34971 to
e8f1bf2
Compare
# Conflicts: # rhino/src/test/java/org/mozilla/javascript/tests/NativeRegExpTest.java
For case-insensitive matching of Unicode surrogate pairs
e8f1bf2 to
462c8b0
Compare
462c8b0 to
db0763e
Compare
|
I've finally managed to finish it up. @aardvark179 It turns out I didn't need to compute case fold of arbitrary Unicode regions in the @rbri would appreciate you taking a look when you have a chance! |
|
@balajirrao did a smoke thest with this and also took the chance to ask some LLM's to create test cases for that. Looks all good - i think we can go with it. |
Enable Unicode case-insensitive regex matching (/iu flag combination) using approximate case folding.