You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds the plumbing for fetching real datasets: per-dataset TODO comments in registry.yml (with API hints for 4TU.ResearchData and Mendeley), a fetch_dataset() skeleton in pm_bench/fetch.py, a _cache.py helper for ~/.cache/pm-bench/, and a pm-bench fetch <name> CLI subcommand. No real URLs or sha256 hashes are guessed — all nullable fields remain null.
Checklist
Registry (pm_bench/registry.yml)
bpi2012: resolve 4TU direct download URL and pin sha256
bpi2017: resolve 4TU direct download URL and pin sha256
bpi2018: resolve 4TU direct download URL and pin sha256
bpi2019: resolve 4TU direct download URL and pin sha256
bpi2020: decide which sub-files to include; resolve individual URLs and pin sha256
sepsis: resolve 4TU direct download URL and pin sha256
helpdesk: resolve Mendeley direct CSV download URL and pin sha256
Fetch implementation (pm_bench/fetch.py)
HTTP download with resume support (Range header)
sha256 verification after download
Atomic move from .tmp to final path
Wire _cache.cache_dir() as the default cache root
Tests (tests/test_fetch.py)
Fill in all TODO tests once fetch_dataset is implemented
Closing as superseded — every TODO in this draft is now complete on main (commit 9c00b47):
Registry — fetch + hash machinery shipped (pm-bench fetch <name> [--pin]); per-dataset hash pins remain pending the one-time TOS-gated downloads (a human step, not a code task).
Fetch implementation — pm_bench/fetch.py ships full HTTP download, atomic move from .part to final path, sha256 verification, PID+UUID-staged tmp files for concurrency safety, partial-download cleanup, and explicit handling for the bundled synthetic-toy case.
Cache — pm_bench/cache.py (note: dropped the leading underscore, since it's part of the public API surface for tests) handles PM_BENCH_CACHE override and per-dataset paths.
Tests — tests/test_fetch.py (16 tests including a tmp-HTTP-server test for the auto-download path) and tests/test_cache.py (path resolution).
The CLI gained pm-bench fetch, plus stats, validate, compare, leaderboard, predict, discover, prefixes, score — all wired up.
If you'd like to keep TODO.md for tracking the per-dataset pinning step (the only remaining v0.1 work), I can open a fresh PR adding just that file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
Adds the plumbing for fetching real datasets: per-dataset TODO comments in
registry.yml(with API hints for 4TU.ResearchData and Mendeley), afetch_dataset()skeleton inpm_bench/fetch.py, a_cache.pyhelper for~/.cache/pm-bench/, and apm-bench fetch <name>CLI subcommand. No real URLs or sha256 hashes are guessed — all nullable fields remainnull.Checklist
Registry (
pm_bench/registry.yml)Fetch implementation (
pm_bench/fetch.py).tmpto final path_cache.cache_dir()as the default cache rootTests (
tests/test_fetch.py)fetch_datasetis implementedRoadmap context
See the Roadmap section of the README.
Generated by Claude Code