Skip to content

Allow to cancel hs_scan*() #139

@rschu1ze

Description

@rschu1ze

We (ClickHouse) recently encountered some patterns which are extremely expensive to evaluate with vector/hyperscan, for example bounded repeats "x{n,m}" (these are also documented as being expensive). As a mitigation, we now check patterns on a best-effort basis and reject them when they will likely be expensive.

A better solution would be to either

  • add a new method to vector/hyperscan that predicts runtime costs ("fast"/"slow" will be sufficient), or
  • (the preferred alternative) allow canceling the scan. Functions hs_scan_*() (*) are provided callbacks which can stop the scan but they are only called when a match is found. Ideally, a second callback can be provided which is called regularly (every N "steps" - whatever that means in the context of vectorscan). I know that vectorscan attempts to stay API-compatible with hyperscan, so these callbacks could be added as new parameters with default value.

EDIT: Just noticed that pattern compilation, i.e. hs_compile_multi(), becomes slow (not: the scan). A callback for canceling hs_compile_*() would be great.

(*) ClickHouse actually only uses block mode, not streaming or vector modes.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions