-
-
Notifications
You must be signed in to change notification settings - Fork 376
add benchmarks using pytest-benchmark and codspeed #3562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@zarr-developers/steering-council I don't have permission to register this repo with codspeed. I submitted a request to register it, could someone approve it? |
done |
|
does anyone have opinions about benchmarks? feel free to suggest something concrete. Otherwise, I think we should take this as-is and deal with later benchmarks (like partial shard read / writes) in a subsequent pr |
.github/workflows/codspeed.yml
Outdated
| uses: CodSpeedHQ/action@v4 | ||
| with: | ||
| mode: instrumentation | ||
| run: hatch run test.py3.11-1.26-minimal:run-benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test the latest instead? seems more appropriate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version of python? What's the reasoning? I'd rather update this file when we drop a supported version vs when a new version of python comes out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we'd want to catch a perf regression from upstream changes too? I'm suggested latest version of released libraries py=3.13, np=2.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have an upper bound on numpy versions, so I don't think this particular workflow will help us catch regressions from upstream changes -- we would need to update this workflow every time a new version of numpy is released. IMO that's something we should do in a separate benchmark workflow. This workflow here will run on every PR, and in that case the oldest version of numpy we support seems better.
we also don't have to use a pre-baked hatch environment here, we could define a dependency set specific to benchmarking. but my feeling is that benchmarking against older versions of stuff gives us a better measure of what users will actually experience.
indexing please. that'll exercise the codec pipeline too. a peakmem metric would be good to track also, if possible. |
I don't think codspeed or pytest-benchmark do memory profiling. we would need https://pytest-memray.readthedocs.io/en/latest/ or something equivalent for that. and an indexing benchmark sounds like a great idea but I don't think I have the bandwidth for it in this pr right now |
…into chore/benchmarks
…into chore/benchmarks
|
(Enabled the app) |
maxrjones
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it'd be better to skip the tests/benchmarks during regular test runs in the interest of speed
Co-authored-by: Max Jones <[email protected]>
i think this makes sense -- on my workstation the current benchmark suite takes 40s to run as regular tests, which is a big addition to our total test runtime. The latest changes to this branch skip the |
|
I'm seeing some flakiness in the codspeed CI run: https://github.com/zarr-developers/zarr-python/actions/runs/21025818794/job/60449955113#step:5:253. Since this is a codspeed-specific issue (from what I can tell), and it doesn't detract from the utility of the benchmark suite for local benchmarking, I'm going to work around this failure by ignoring this warning. |
since #3554 was an unpopular direction I'm going instead with codspeed + pytest-benchmark. Opening as a draft because I haven't looked into how codspeed works at all, but I'd like people to weigh in on whether these initial benchmarks make sense. Naturally we can add more specific ones later, but I figured just some bulk array read / write workloads would be a good start.