Regexp: support for case-insensitive Unicode matching by balajirrao · Pull Request #2130 · mozilla/rhino

balajirrao · 2025-10-17T16:06:00Z

Enable Unicode case-insensitive regex matching (/iu flag combination) using approximate case folding.

rbri · 2025-11-21T06:30:48Z

@balajirrao any plans for finishing this? Waiting for that to makt the separate engine pr...

balajirrao · 2025-11-21T09:13:47Z

@rbri I thought I was going to finish it and then I hit a wall. I believe that in order to do this in the general case, we'd need icu4j. I'm considering creating a module outside of rhino, say, rhino-icu4j that when included would offer complete Unicode support in regexps and possibly in other cases too. How does that sound ?

andreabergia · 2025-11-28T10:13:36Z

@rbri I thought I was going to finish it and then I hit a wall. I believe that in order to do this in the general case, we'd need icu4j. I'm considering creating a module outside of rhino, say, rhino-icu4j that when included would offer complete Unicode support in regexps and possibly in other cases too. How does that sound ?

IMHO that's the right approach. An opt-in module that, if present, adds the capability. If not, we can error out with "not supported". It would be a good improvement on what we do now.

aardvark179 · 2025-12-01T14:07:59Z

I'm not sure the complement classes present an insurmountable wall. icu4j would certainly offer a route to a complete implementation, but it would also be entirely reasonable to calculate classes, and their complements, when needed. Looping from 0 to MAX_CODE_POINT and building a range structure doesn't actually take much time, and most unicode classes have ranges that can be represented pretty compactly.

# Conflicts: # rhino/src/test/java/org/mozilla/javascript/tests/NativeRegExpTest.java

For case-insensitive matching of Unicode surrogate pairs

… matchers

balajirrao · 2026-02-24T14:08:24Z

I've finally managed to finish it up.

@aardvark179 It turns out I didn't need to compute case fold of arbitrary Unicode regions in the u mode. It's needed only for the v mode - it was clear from the spec, it was MDN that I was confused by.

@rbri would appreciate you taking a look when you have a chance!

rbri · 2026-03-01T15:12:25Z

@balajirrao did a smoke thest with this and also took the chance to ask some LLM's to create test cases for that. Looks all good - i think we can go with it.

balajirrao force-pushed the regexp-unicode-caseinsensitive branch from 6a24f28 to 647c882 Compare October 17, 2025 16:06

balajirrao force-pushed the regexp-unicode-caseinsensitive branch 3 times, most recently from fa34971 to e8f1bf2 Compare February 24, 2026 09:44

balajirrao added 5 commits February 24, 2026 10:56

Allow 'u' and 'i' flags to be used together

e782982

Add approximate unicode case-folding

9c73b8d

Change isWord to handle case-insensitive Unicode mode

a3572a1

# Conflicts: # rhino/src/test/java/org/mozilla/javascript/tests/NativeRegExpTest.java

Introduce opcode REOP_UCSPFLAT1i

2b52a81

For case-insensitive matching of Unicode surrogate pairs

Case-insensitive matching with anchor

64e9ece

balajirrao force-pushed the regexp-unicode-caseinsensitive branch from e8f1bf2 to 462c8b0 Compare February 24, 2026 10:29

balajirrao marked this pull request as ready for review February 24, 2026 13:38

balajirrao added 5 commits February 24, 2026 15:07

case-insensitive unicode support for flatNIMatcher and flatNIBackward…

7b6f835

… matchers

case-insensitive matching support for classes

57c0769

Property escapes

cce9253

Backref matcher

33535f0

Update test262.properties

db0763e

balajirrao force-pushed the regexp-unicode-caseinsensitive branch from 462c8b0 to db0763e Compare February 24, 2026 14:08

balajirrao changed the title ~~Regexp: support for case-insensitive unicode matching~~ Regexp: support for case-insensitive Unicode matching Feb 24, 2026

rbri approved these changes Mar 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regexp: support for case-insensitive Unicode matching#2130

Regexp: support for case-insensitive Unicode matching#2130
balajirrao wants to merge 10 commits intomozilla:masterfrom
balajirrao:regexp-unicode-caseinsensitive

balajirrao commented Oct 17, 2025 •

edited

Loading

Uh oh!

rbri commented Nov 21, 2025

Uh oh!

balajirrao commented Nov 21, 2025

Uh oh!

andreabergia commented Nov 28, 2025 •

edited

Loading

Uh oh!

aardvark179 commented Dec 1, 2025

Uh oh!

balajirrao commented Feb 24, 2026 •

edited

Loading

Uh oh!

rbri commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

balajirrao commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rbri commented Nov 21, 2025

Uh oh!

balajirrao commented Nov 21, 2025

Uh oh!

andreabergia commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aardvark179 commented Dec 1, 2025

Uh oh!

balajirrao commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rbri commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

balajirrao commented Oct 17, 2025 •

edited

Loading

andreabergia commented Nov 28, 2025 •

edited

Loading

balajirrao commented Feb 24, 2026 •

edited

Loading