Skip to content

regex-syntax: some way to retain the AST Span of some punctuation marks? #1271

Open
@kennytm

Description

@kennytm

Consider:

use regex_syntax::ast::parse::ParserBuilder;

fn main() {
    let parse = |pattern| {
        ParserBuilder::new()
            .ignore_whitespace(true)
            .build()
            .parse_with_comments(pattern)
            .unwrap()
    };

    let wc_1 = parse("a #c\n|b");
    let wc_2 = parse("a|#c\n b");
    assert_ne!(wc_1, wc_2);
}

The comment #c is attached to different alternatives in the two regex, but the parse output of both are equivalent:

WithComments { 
    ast: Alternation(Alternation { 
        span: Span(Position(o: 0, l: 1, c: 1), Position(o: 7, l: 2, c: 3)), 
        asts: [
            Literal(Literal { 
                span: Span(Position(o: 0, l: 1, c: 1), Position(o: 1, l: 1, c: 2)), 
                kind: Verbatim, 
                c: 'a' 
            }), 
            Literal(Literal { 
                span: Span(Position(o: 6, l: 2, c: 2), Position(o: 7, l: 2, c: 3)), 
                kind: Verbatim, 
                c: 'b' 
            })
        ] 
    }), 
    comments: [
        Comment { 
            span: Span(Position(o: 2, l: 1, c: 3), Position(o: 5, l: 2, c: 1)), 
            comment: "c" 
        }
    ] 
}

$$\overbrace{\overbrace{\Huge\color{red} \texttt{a}\mathstrut}^{\textrm{Literal(0..1)}}{\Huge\color{blue}\texttt{␣ }}\underbrace{\Huge\color{green}\texttt{\# c ↵}\mathstrut}_{\textrm{Comment(2..5)}}{\Huge\color{blue}\texttt{ |}}\overbrace{\Huge\color{red}\texttt{b}\mathstrut}^{\textrm{Literal(6..7)}}}^{\textrm{Alternation(0..7)}}$$

Without knowing the span of the | punctuation we cannot know if the comment should belong to a or b from parse_with_comments() alone. We have to refer back to the original pattern. At which point perhaps it is easier to just write the parser ourselves 🤷

I think the Ast type itself should include the Span of these marks when their position cannot be inferred, like the | in a|b|c or the , in a{3,100}.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions