Skip to content

Vectorscan does not support backreferences with indices >= 8 (HS_FLAG_PREFILTER is on) #209

@apismensky

Description

@apismensky

Regex a([ -]?)a\\1a|b([ .-]?)b\\2b|c([ -]?)c\\3c|d([ -]?)d\\4d|e([ -]?)e\\5e|f([ -]?)f\\6f|g([ -]?)g\\7g|h([ -]?)h\\8h
should match all following strings: "a a a", "b b b", "c c c", "d d d", "e e e", "f f f", "g g g" and "h h h"
but it matches everything except "h h h"

test to reproduce:

TEST(order, alexey1) {
    vector<pattern> patterns;
    patterns.push_back(pattern("a([ -]?)a\\1a|b([ .-]?)b\\2b|c([ -]?)c\\3c|d([ -]?)d\\4d|e([ -]?)e\\5e|f([ -]?)f\\6f|g([ -]?)g\\7g|h([ -]?)h\\8h", HS_FLAG_DOTALL | HS_FLAG_PREFILTER | HS_FLAG_MULTILINE | HS_FLAG_CASELESS | HS_FLAG_UCP | HS_FLAG_UTF8, 1));
    const char *data = "h h h";

    hs_database_t *db = buildDB(patterns, HS_MODE_NOSTREAM);
    ASSERT_NE(nullptr, db);

    hs_scratch_t *scratch = nullptr;
    hs_error_t err = hs_alloc_scratch(db, &scratch);
    ASSERT_EQ(HS_SUCCESS, err);

    CallBackContext c;
    err = hs_scan(db, data, strlen(data), 0, scratch, record_cb,
                  (void *)&c);
    ASSERT_EQ(HS_SUCCESS, err);

    EXPECT_EQ(1, countMatchesById(c.matches, 1));
    err = hs_free_scratch(scratch);
    ASSERT_EQ(HS_SUCCESS, err);
    hs_free_database(db);
}

There is some comment for 8 and 9 in: https://github.com/VectorCamp/vectorscan/blob/master/src/parser/Parser.rl#L1503 . But not sure why 8 and 9 are special cases? Are we supposed to pass them as octal numbers?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions