regex-syntax: some way to retain the AST Span of some punctuation marks?

Consider:

```rust
use regex_syntax::ast::parse::ParserBuilder;

fn main() {
    let parse = |pattern| {
        ParserBuilder::new()
            .ignore_whitespace(true)
            .build()
            .parse_with_comments(pattern)
            .unwrap()
    };

    let wc_1 = parse("a #c\n|b");
    let wc_2 = parse("a|#c\n b");
    assert_ne!(wc_1, wc_2);
}
```

The comment `#c` is attached to different alternatives in the two regex, but the parse output of both are equivalent:

```rust
WithComments { 
    ast: Alternation(Alternation { 
        span: Span(Position(o: 0, l: 1, c: 1), Position(o: 7, l: 2, c: 3)), 
        asts: [
            Literal(Literal { 
                span: Span(Position(o: 0, l: 1, c: 1), Position(o: 1, l: 1, c: 2)), 
                kind: Verbatim, 
                c: 'a' 
            }), 
            Literal(Literal { 
                span: Span(Position(o: 6, l: 2, c: 2), Position(o: 7, l: 2, c: 3)), 
                kind: Verbatim, 
                c: 'b' 
            })
        ] 
    }), 
    comments: [
        Comment { 
            span: Span(Position(o: 2, l: 1, c: 3), Position(o: 5, l: 2, c: 1)), 
            comment: "c" 
        }
    ] 
}
```

$$\overbrace{\overbrace{\Huge\color{red} \texttt{a}\mathstrut}^{\textrm{Literal(0..1)}}{\Huge\color{blue}\texttt{␣ }}\underbrace{\Huge\color{green}\texttt{\\# c ↵}\mathstrut}_{\textrm{Comment(2..5)}}{\Huge\color{blue}\texttt{ |}}\overbrace{\Huge\color{red}\texttt{b}\mathstrut}^{\textrm{Literal(6..7)}}}^{\textrm{Alternation(0..7)}}$$

Without knowing the span of the `|` punctuation we cannot know if the comment should belong to `a` or `b` from `parse_with_comments()` alone. We have to refer back to the original pattern. At which point perhaps it is easier to just write the parser ourselves :shrug:

I think the `Ast` type itself should include the Span of these marks when their position cannot be inferred, like the `|` in `a|b|c` or the `,` in `a{3,100}`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

regex-syntax: some way to retain the AST Span of some punctuation marks? #1271

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

regex-syntax: some way to retain the AST Span of some punctuation marks? #1271

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions