perf: Avoid re-canonicalizing the entire IntervalSet on push by Marwes · Pull Request #1308 · rust-lang/regex

Marwes · 2025-10-20T14:49:11Z

Canonicalize is taking up a significant amount due to a regex with a huge amount of character ranges (generated by lalrpop's lexer expanding multiple \w in a token). While this could perhaps be fixed in lalrpop I did notice the TODO in the code and after addressing this so we automatically union and compress on each push instead of re-canonicalizing on every push and that fixed the performance problem.

I did see the earlier attempt at this #1051 and it seems like that was reverted and regression tests were added so I hope that and the existing tests are enough (I don't have a clear idea on what tests might be missing).

Canonicalize is taking up a significant amount due to a regex with a huge amount of character ranges (generated by [lalrpop](https://github.com/lalrpop/lalrpop)'s lexer expanding multiple `\w` in a token). While this could perhaps be fixed in lalrpop I did notice the TODO in the code and after addressing this so we automatically union and compress on each push instead of re-canonicalizing on every push and that fixed the performance problem. I did see the earlier attempt at this rust-lang#1051 and it seems like that was reverted and regression tests were added so I hope that and the existing tests are enough (I don't have a clear idea on what tests might be missing).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Avoid re-canonicalizing the entire IntervalSet on push#1308

perf: Avoid re-canonicalizing the entire IntervalSet on push#1308
Marwes wants to merge 1 commit intorust-lang:masterfrom
Marwes:interval_set_fast_push

Marwes commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Marwes commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant