<regex>
: Add multiline option and make non-multiline mode the default
#5535
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #73 (also tracked by DevCom-268592 / VSO-629739) and implements LWG-2503. Arguably also resolves DevCom-436138 to the degree that it is reasonable (namely, when the anchor appears in the regex before it starts branching); see the benchmark. Unblocks four libcxx tests.
This PR also aligns the multiline mode with ECMAScript's specification. The anchors now match at any of ECMAScript's line terminators: carriage returns, line feeds, line separators and paragraph separators. Before, the anchors only matched at line feeds.
The PR provides
_REGEX_MAKE_MULTILINE_MODE_DEFAULT
as an escape hatch to return to default multiline mode; if so, non-multiline mode is not available.For POSIX grammars, the new
multiline
option has no effect. While I find this unfortunate, this behavior appears to have been specified in [re.synopt].To simplify the logic in the matcher and avoid some preprocessor #ifdefs, the matcher's internal copy of the regex syntax flags
_Sflags
is mutated before matching starts:multiline
flag is set for all grammars when the escape hatch is defined.multiline
flag is cleared for POSIX grammars when the escape hatch is not defined.These mutations ensure that multiline mode is enabled if and only if the
multiline
flag is set in_Sflags
.I see a potential concern with the implementation in this PR: Even if the escape hatch is set, the matcher still changes behavior and allows anchors to match not just line feeds but all ECMAScript line terminators. It can reasonably be argued that the behavior should be completely unchanged if the escape hatch is defined. Even so, I opted to submit the implementation with ECMAScript-conforming line terminators in this PR first because this simplifies the implementation a lot.
Benchmark
Only for pattern "^bibe" to show that this resolves DevCom-436138.