We (ClickHouse) recently encountered some patterns which are extremely expensive to evaluate with vector/hyperscan, for example bounded repeats "x{n,m}" (these are also documented as being expensive). As a mitigation, we now check patterns on a best-effort basis and reject them when they will likely be expensive.
A better solution would be to either
- add a new method to vector/hyperscan that predicts runtime costs ("fast"/"slow" will be sufficient), or
- (the preferred alternative) allow canceling the scan. Functions
hs_scan_*() (*) are provided callbacks which can stop the scan but they are only called when a match is found. Ideally, a second callback can be provided which is called regularly (every N "steps" - whatever that means in the context of vectorscan). I know that vectorscan attempts to stay API-compatible with hyperscan, so these callbacks could be added as new parameters with default value.
EDIT: Just noticed that pattern compilation, i.e. hs_compile_multi(), becomes slow (not: the scan). A callback for canceling hs_compile_*() would be great.
(*) ClickHouse actually only uses block mode, not streaming or vector modes.
We (ClickHouse) recently encountered some patterns which are extremely expensive to evaluate with vector/hyperscan, for example bounded repeats "x{n,m}" (these are also documented as being expensive). As a mitigation, we now check patterns on a best-effort basis and reject them when they will likely be expensive.
A better solution would be to either
hs_scan_*()(*) are provided callbacks which can stop the scan but they are only called when a match is found. Ideally, a second callback can be provided which is called regularly (every N "steps" - whatever that means in the context of vectorscan). I know that vectorscan attempts to stay API-compatible with hyperscan, so these callbacks could be added as new parameters with default value.EDIT: Just noticed that pattern compilation, i.e.
hs_compile_multi(),becomes slow (not: the scan). A callback for cancelinghs_compile_*()would be great.(*) ClickHouse actually only uses block mode, not streaming or vector modes.