Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare256 function pointer #273

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

folkertdev
Copy link
Collaborator

when e.g. the avx2 target feature is not enabled at compile time, but the feature is available at runtime, this approach reduces branching. We still dispatch statically if the target feature is already enabled at compile time

@folkertdev folkertdev marked this pull request as draft December 24, 2024 17:52
@folkertdev
Copy link
Collaborator Author

for level 1

Benchmark 1 (62 runs): target/release/examples/compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.7ms ±  991us    80.3ms … 87.8ms          3 ( 5%)        0%
  peak_rss           26.7MB ± 53.8KB    26.6MB … 26.7MB         13 (21%)        0%
  cpu_cycles          305M  ± 3.65M      298M  …  328M           5 ( 8%)        0%
  instructions        661M  ±  278       661M  …  661M           0 ( 0%)        0%
  cache_references   19.8M  ±  185K     19.5M  … 20.8M           2 ( 3%)        0%
  cache_misses        432K  ± 86.9K      336K  …  783K           2 ( 3%)        0%
  branch_misses      2.94M  ± 8.26K     2.88M  … 2.96M           2 ( 3%)        0%
Benchmark 2 (63 runs): target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          80.3ms ±  584us    78.7ms … 83.1ms          2 ( 3%)        ⚡-  1.6% ±  0.3%
  peak_rss           26.7MB ± 63.2KB    26.5MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          299M  ± 2.38M      293M  …  312M           3 ( 5%)        ⚡-  2.0% ±  0.4%
  instructions        620M  ±  255       620M  …  620M           1 ( 2%)        ⚡-  6.2% ±  0.0%
  cache_references   19.7M  ±  146K     19.5M  … 20.2M           3 ( 5%)          -  0.8% ±  0.3%
  cache_misses        409K  ± 74.2K      311K  …  664K           1 ( 2%)          -  5.3% ±  6.6%
  branch_misses      2.95M  ± 15.7K     2.87M  … 2.96M           2 ( 3%)          +  0.6% ±  0.2%

@brian-pane
Copy link

brian-pane commented Dec 28, 2024

This shows an improvement on x86_64 Intel as well (compiled without RUSTFLAGS="-Ctarget-cpu=native", in order to generate the new function-pointer code path).

Benchmark 1 (64 runs): ./blogpost-compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          78.9ms ±  852us    77.6ms … 83.1ms          1 ( 2%)        0%
  peak_rss           26.6MB ± 60.6KB    26.5MB … 26.7MB          0 ( 0%)        0%
  cpu_cycles          307M  ±  536K      305M  …  308M           0 ( 0%)        0%
  instructions        661M  ±  290       661M  …  661M           0 ( 0%)        0%
  cache_references    265K  ± 3.68K      262K  …  285K           4 ( 6%)        0%
  cache_misses        231K  ± 6.74K      205K  …  237K           8 (13%)        0%
  branch_misses      2.93M  ± 3.84K     2.92M  … 2.94M           1 ( 2%)        0%
Benchmark 2 (67 runs): ./target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          75.4ms ±  526us    74.6ms … 76.7ms          0 ( 0%)        ⚡-  4.3% ±  0.3%
  peak_rss           26.6MB ± 60.4KB    26.6MB … 26.7MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          292M  ±  545K      291M  …  294M           0 ( 0%)        ⚡-  4.6% ±  0.1%
  instructions        620M  ±  261       620M  …  620M           0 ( 0%)        ⚡-  6.2% ±  0.0%
  cache_references    266K  ± 5.65K      262K  …  298K           4 ( 6%)          +  0.2% ±  0.6%
  cache_misses        231K  ± 7.01K      203K  …  237K           5 ( 7%)          -  0.2% ±  1.0%
  branch_misses      3.03M  ± 4.91K     3.02M  … 3.04M           0 ( 0%)        💩+  3.6% ±  0.1%

@folkertdev folkertdev force-pushed the compare256-function-pointer branch from 4e432fd to c2ce875 Compare January 20, 2025 09:38
Copy link

codecov bot commented Jan 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
zlib-rs/src/deflate/algorithm/quick.rs 97.72% <100.00%> (+0.05%) ⬆️
zlib-rs/src/deflate/compare256.rs 98.31% <100.00%> (+0.34%) ⬆️

... and 36 files with indirect coverage changes

when e.g. the avx2 target feature is not enabled at compile time, but the feature is available at runtime, this approach reduces branching. We still dispatch statically if the target feature is already enabled at compile time
@folkertdev folkertdev force-pushed the compare256-function-pointer branch from c2ce875 to 4c71540 Compare January 20, 2025 13:32
@folkertdev folkertdev force-pushed the main branch 2 times, most recently from 72c2c57 to af07d45 Compare February 6, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants