Skip to content

Performance results are not reproducible #3

@nalgeon

Description

@nalgeon

Hello Alex! I checked the claims on the Benchmarks page regarding sqlean-re being 6.57x slower than sqlite-regex and found the results unreproducible. Also, the test itself is very unfair.

The unfair test

I'll start with the latter, as it is independent of the hardware. You are benchmarking sqlite-regex with this pattern:

\d{4}-\d{2}-\d{2}

While for sqlean-re, you use this pattern:

([0-9])([0-9])([0-9])([0-9])-([0-9])([0-9])-([0-9])([0-9])

These are very different patterns. You introduced eight capturing groups into the second pattern, which makes this regexp significantly slower. The first pattern, on the other hand, has zero capturing groups. I don't think you can compare these two patterns in the same benchmark.

To make the patterns roughly equivalent, the second one should be:

[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]

The unreproducible results

After changing the pattern for sqlean-re to be equivalent to sqlite-regex, I ran your benchmark on MacBook Air (M1, 2020), and got these results:

Benchmark 1: ./sqlite-regex.sh
  Time (mean ± σ):     433.4 ms ±   1.6 ms    [User: 426.2 ms, System: 5.6 ms]
  Range (min … max):   431.7 ms … 436.4 ms    10 runs
 
Benchmark 2: ./sqlean-re.sh
  Time (mean ± σ):     474.4 ms ±   3.5 ms    [User: 465.9 ms, System: 6.0 ms]
  Range (min … max):   469.3 ms … 479.0 ms    10 runs
 
Summary
  './sqlite-regex.sh' ran
    1.09 ± 0.01 times faster than './sqlean-re.sh'

So much for "6.57x slower".

I don't think that the disclaimer "Benchmarks are hard and easy to game" justifies your claims about the relative performance of different regexp implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions