Prototype benchmark code #333

alexandrebouchard · 2025-03-28T22:54:33Z

No description provided.

miguelbiron · 2025-03-29T18:32:17Z

I was thinking that, instead of / in addition to creating new artificial tests, we could just grab the timing information in the DefaultTestSet structure.. We could either do this at the top level only or recursively for every single testset.

trevorcampbell · 2025-03-29T20:08:36Z

@miguelbiron do you mean that we'd collect/report benchmarking results for the tests in our test suite? I'm not sure there will be a tonne of overlap between the things we want to benchmark and the things we want to use for unit testing.

miguelbiron · 2025-03-29T20:18:17Z

That's true... Also there's no alloc data in the testset.

trevorcampbell · 2025-03-29T22:43:51Z

@miguelbiron @alexandrebouchard I've added the prototype code for the workflows. A brief explanation:

the current version of benchmarking results for main is always kept up to date on test/benchmark.csv
any time there is a commit to main, benchmark_update.yml runs and refreshes the new benchmarking results, and commits those back to main.
any time there is a PR targeting main, benchmark_compare.yml runs on PRs and compares to the current version of the benchmarking results stored in the csv file to the new PR code. The results are posted to the PR thread as a markdown table in a comment. Whenever the PR is updated, the comparison will re-run and the markdown table will be updated.
both workflows have a 60 min timeout
test/benchmark.jl is responsible for running the benchmark suite (I made a minor modification to the format of the csv)
test/compare_benchmarks.jl is responsible for comparing two benchmarking CSVs and output the difference as a markdown table
Right now this runs on a github runner, but with a small change, once we are ready to spin up the runner, we can make this self-hosted (it's commented out currently)

One issue we should fix before merging: both benchmark.jl and compare_benchmarks.jl trigger Julia to download/install/build all kinds of packages on the workflow. The benchmarking itself is almost instantaneous, but the package building/installation is super slow (especially for compare_benchmarks.jl, it's like 15-20 mins or so).

I am not entire sure how we do environment setup in the benchmark.jl script, but I just copied it for compare_benchmarks.jl -- I imagine we can probably be smarter about that somehow? Thoughts?

trevorcampbell · 2025-03-29T22:45:58Z

I also had an idea of something neat we could do: we could include a plot of benchmarking results as a function of time in the Pigeons documentation.

This command:

git log --pretty=format:"%H" --follow -- test/benchmark.csv

will spit out a list of git commit hashes corresponding to commits where test/benchmark.csv was changed. When we build the docs, we can iterate through those, collect all the results CSVs, join them and create PlotlyJS plots for each test, output an html file that is linked from the docs.

Thoughts?

trevorcampbell · 2025-03-29T22:48:26Z

Ah one more thing @alexandrebouchard we should probably modfiy test/benchmark.jl to run >1 trials for each benchmark and report the median / 25 / 75 or something like that.

(At least on mvn1000 it seems the results vary by around 10% regularly for a single trial)

miguelbiron · 2025-03-29T22:57:07Z

Ditto the last comment and also we probably need some basic metadata about the environment. At the very least we should store the exact Julia version used.

codecov · 2025-03-30T19:43:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.82%. Comparing base (e492fa7) to head (c837962).
⚠️ Report is 34 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #333      +/-   ##
==========================================
+ Coverage   87.40%   87.82%   +0.42%     
==========================================
  Files         107      107              
  Lines        2660     2654       -6     
==========================================
+ Hits         2325     2331       +6     
+ Misses        335      323      -12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

miguelbiron · 2025-03-30T21:56:46Z

@alexandrebouchard @trevorcampbell the failing docs are due to the old nonreproducible errors that seem to have been fixed with the new DPPL version. We should just merge this PR so that we can then merge #328 which doesn't suffer from this

trevorcampbell · 2025-04-18T07:03:09Z

OK I think I'm giving up on reducing precompilation time for now. Let's just push that off to a later PR and get this one sorted.

Seems like the last TODOs here are

include basic metainfo in the results CSV
run each benchmark > 1 times to average and get estimates of std error
store CI console log and Manifest files somewhere

trevorcampbell · 2025-08-16T06:38:11Z

OK @alexandrebouchard this should be all sorted now. Example PR thread message with diff below:

trevorcampbell · 2025-08-16T06:40:44Z

Ah actually @alexandrebouchard one thing I'd like your eye on before we merge is how packages are activated/installed/used in benchmark.jl and compare_benchmarks.jl

You'll notice the using Pigeons call takes 25s (previously it was 2s) according to the benchmark, which seems off to me (maybe that includes some precompilation, not sure -- you have a better grasp of how julia manages packages and the Pigeons ecosystem than I do)

but once that's sorted we can merge and i'll get the runner set up properly for this repo

trevorcampbell · 2025-08-18T00:14:18Z

Improved the look&feel of the diff table and included metainfo in the diff

alexandrebouchard · 2025-08-18T13:47:01Z

This is looks really great and will be super useful. I'll add more targets very shortly, I'm building a collection.

trevorcampbell · 2025-08-20T19:35:35Z

@alexandrebouchard FYI I replaced versioninfo(verbose=true) with just versioninfo() to avoid the noisy/unreliable CPU measurements from polluting the diffs. It still has almost all the information we'd want (except memory, but c'est la vie)

…ons.jl into bench-proto

Prototype benchmark code

f4a3993

initial benchmarking workflow for PR comparisons

366fcc3

trevorcampbell added 2 commits March 29, 2025 15:56

specify target for comparison benchmarking

c0222c5

update target for benchmark update on main

94cf681

Update test_JuliaBUGS.jl

c57ffba

Merge branch 'main' into bench-proto

8adcc54

trevorcampbell added 11 commits August 15, 2025 14:44

created a new bench folder for benchmarks to keep test folder cleaner

f898b14

added dataframes, csv, prettytables to benchmarking project toml

f372746

include setup jl in benchmarking scripts

fff043d

remove using pigeons from benchmarking setup

0ad95db

benchmarking for median over 10 trials

3041ecf

minor bugfix in benchmark jl

3ee3440

minor ed to pr thread message

1c1c725

added metainfo tracking to benchmarking

10dfbbb

added metainfo diff to PR thread

dede728

fixed diff syntax output in comparison workflow

dbdb3c0

bugfix in diff printing with sed removal of environment

81e3f69

prettier table printing in PR diffs

97ddd8b

trevorcampbell added 2 commits August 17, 2025 19:21

improved handling of missing/new values in pr wkflow benchmark

76142c4

nicer pr thread post format for benchmarks

21f37ae

trevorcampbell added 2 commits August 19, 2025 15:38

absolute value percentages in benchmark PR diff table comparison

d8aa877

benchmarking versioninfo non-verbose to avoid noisy cpu measurements

2a702bd

alexandrebouchard and others added 7 commits August 21, 2025 20:19

Add targets (via InferenceTargets)

3c5489d

Merge branch 'bench-proto' of https://github.com/Julia-Tempering/Pige…

c34d964

…ons.jl into bench-proto

Reorg MPI tests, more rounds, skip costly target

e9b491f

Speed it up

57e0ecd

Fix compare_benchmarks.jl dependencies

e14f42f

improvements and bugfixes to benchmarking workflows

5b1c032

supressor for warnings and more specific benchmark names

c837962

Prototype benchmark code #333

Are you sure you want to change the base?

Prototype benchmark code #333

Uh oh!

Conversation

alexandrebouchard commented Mar 28, 2025

Uh oh!

miguelbiron commented Mar 29, 2025

Uh oh!

trevorcampbell commented Mar 29, 2025

Uh oh!

miguelbiron commented Mar 29, 2025

Uh oh!

trevorcampbell commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trevorcampbell commented Mar 29, 2025

Uh oh!

trevorcampbell commented Mar 29, 2025

Uh oh!

miguelbiron commented Mar 29, 2025

Uh oh!

codecov bot commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

miguelbiron commented Mar 30, 2025

Uh oh!

trevorcampbell commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trevorcampbell commented Aug 16, 2025

Uh oh!

trevorcampbell commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trevorcampbell commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexandrebouchard commented Aug 18, 2025

Uh oh!

trevorcampbell commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

trevorcampbell commented Mar 29, 2025 •

edited

Loading

codecov bot commented Mar 30, 2025 •

edited

Loading

trevorcampbell commented Apr 18, 2025 •

edited

Loading

trevorcampbell commented Aug 16, 2025 •

edited

Loading

trevorcampbell commented Aug 18, 2025 •

edited

Loading