Streamline parcels-benchmarks by VeckoTheGecko · Pull Request #42 · Parcels-code/parcels-benchmarks

VeckoTheGecko · 2026-03-27T16:11:18Z

This PR reworks parcels-benchmarks in a way (I hope) is much easier to work with. Follow the README and let me know what you think.

Changes:

Replaces the parcels_benchmarks internal package (which provided the CLI tool for adding dataset hashes etc.). Now instead:
- An intake-xarray catalog is defined in catalogs/parcels-benchmarks/catalog.yml. The top of the file has a comment which contains the link to the ZIP to be downloaded.
  - This streamlines our approach, making it easier for the benchmarking scripts to go straight from data on disk to xarray dataset.
  - We can use other options available via intake
  - This approach allows us to get familiar with intake which will likely be used for our HPC systems after v4 is released.
- A script (scripts/download-catalog.py) downloads the data for a catalog and takes a output_dir (both via CLI args). This uses curl to download the dataset, and then unzips all nested zip files (deleting the original zips). This script also copies the catalog file into the output_dir (which is good since the datasets in the catalog are defined relative to this catalog file).
  - If a catalog is already downloaded (i.e., if the folder already exists) its skipped
  - Pro: The use of curl here means this approach is quite transparent - one can easily see download speeds and decide to cancel
  - Con: There is no longer the concept of "known hashes" - this is something we can get back if we want in future¹
- Pixi is used, via the setup-data task, to download all the datasets.
  - This makes our data approach much more flexible should we want to change it in future
Requires a PARCELS_BENCHMARKS_DATA_FOLDER environment variable to be explicitly set which is then acts as the working space for the data. This environment variable is used in the download and benchmarking code.

We needed the following things to ease development:

Download all datasets before running benchmarks
Make it transparent the download progress of datasets

given we are the sole owners of our data sources I don't think this is a concern ↩

Just expect it to be provided in the datasets.json

All datasets are "example" datasets

With placeholder code. Will fix in another PR

for more information, see https://pre-commit.ci

VeckoTheGecko · 2026-03-27T16:11:52Z

Not all the benchmarks are running. Once this is merged I'll fix the rest in #40 .

Let me know what you think of this @fluidnumerics-joe

VeckoTheGecko · 2026-03-27T16:14:11Z

Oh, and since Parcels is now a submodule I think you'll need to do git submodule update --init --recursive (if you aren't doing a fresh clone from the README)

fluidnumerics-joe · 2026-03-31T17:57:53Z

Is it intentional to not have the FESOM and ICON datasets in the catalog ? I'm confused about where that went.

VeckoTheGecko · 2026-03-31T18:36:13Z

I should have mentioned in the PR description, I was planning on having it in a future PR (wanted to avoid conflicts with the other reworking of the ingestion code)

VeckoTheGecko · 2026-03-31T18:38:46Z

Also I need to figure out exactly how intake integrates with Uxarray. The fact Uxarray doesn't initialise from an xarray dataset (ie has uxr.open_mfdataset as the main entry) slightly complicated things

benchmarks/__init__.py

catalogs/parcels-benchmarks/catalog.yml

erikvansebille · 2026-04-01T05:42:17Z

catalogs/parcels-benchmarks/catalog.yml

What's the difference between the catalogues in parcels-benchmarks and the parcels-examples? They seem to be the same now?

Yes, to be updated in a future PR (mainly focussing on the actual downloading of the datasets - will fix the catalogs and ingestion at the same time)

sandbox/.gitattributes

.env.example

- Rafactor path variables and move catalogue definitions to separate file - Since the downloading script now relies on the

Using the `pyproject.toml` file to specify the project dependencies (including the local dependency with Parcels). This helps ensure that dependencies are available in a way available to ASV as well. Maybe this ASV <-> pixi thing needs to be further investigated... ASV's own environment management is something I find confusing

Looks like stripping the environments out of ASV is on the roadmap airspeed-velocity/asv#1581 , which will mean that we can fully manage them with pixi (which would be great)

for more information, see https://pre-commit.ci

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

VeckoTheGecko added 30 commits March 25, 2026 18:49

Update README

fd2c217

Update gitignore

b75d0b3

Add Parcels as submodule

295cb7b

Add sandbox environment

98855a5

Rename benchmarks.json to datasets.json

bb967c6

typo

4d59a44

update function name

1f8777c

No need to have this default here

96b3646

Just expect it to be provided in the datasets.json

Add pydantic

c56cf69

Migrate to pydantic

2cec8b4

Assert no duplicate dataset names

64fd72c

refactor

b281743

Rename function

96e24a7

All datasets are "example" datasets

Add download-catalogue option

2d83e26

Update catalogue

4570368

Move files and add tasks

9fa1963

Add catalogue

bddcbdc

Move file

99cf82d

Update folder location

070bcf0

Add PARCELS_BENCHMARKS_DATA_FOLDER env var

96b97d8

Use curl instead

98bb9e7

Rename files (catalogue to catalog and yaml->yml)

42d7f53

Add task descriptions

fc37186

Update readme

e582cff

Remove parcels_benchmarks internal package

2a064be

Update script to unpack zips correctly

c3299ba

Update toml and lock

0f3a93a

Add comment

9a25838

Update catalogs regardless of folder existing

0c2cd03

Fix catalogues

0c2b6cf

VeckoTheGecko and others added 4 commits March 27, 2026 16:55

Migrate fesom ingestion to intake

c2eb313

Update MOI

b4da946

With placeholder code. Will fix in another PR

Update ASV conf

6fb1b7f

[pre-commit.ci] auto fixes from pre-commit.com hooks

515b767

for more information, see https://pre-commit.ci

erikvansebille approved these changes Apr 1, 2026

View reviewed changes

VeckoTheGecko added 5 commits April 2, 2026 09:41

Default PARCELS_BENCHMARKS_DATA_FOLDER to ./data

4365e7b

- Rafactor path variables and move catalogue definitions to separate file - Since the downloading script now relies on the

Fix ASV/py-rattler deps

1a06ac3

Looks like stripping the environments out of ASV is on the roadmap airspeed-velocity/asv#1581 , which will mean that we can fully manage them with pixi (which would be great)

run pre-commit

d46f4b1

Rename asv.conf.json to .jsonc (comment supported format)

b1d3bc5

VeckoTheGecko force-pushed the improvements branch from 097ab16 to b1d3bc5 Compare April 2, 2026 13:28

pre-commit-ci bot and others added 5 commits April 2, 2026 13:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

0675afe

for more information, see https://pre-commit.ci

Review feedback

f992e3a

update readme

10f4c96

Disable isort

cbbc180

Update catalogs/parcels-benchmarks/catalog.yml

c116a3c

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

VeckoTheGecko merged commit 17c74e1 into main Apr 2, 2026
1 check passed

VeckoTheGecko deleted the improvements branch April 2, 2026 13:45

This was referenced Apr 2, 2026

Clean up (un)zipped files in cache #28

Closed

Overriding pooch cache location #18

Closed

ASV #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline parcels-benchmarks#42

Streamline parcels-benchmarks#42
VeckoTheGecko merged 44 commits intomainfrom
improvements

VeckoTheGecko commented Mar 27, 2026

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

fluidnumerics-joe commented Mar 31, 2026

Uh oh!

VeckoTheGecko commented Mar 31, 2026

Uh oh!

VeckoTheGecko commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

erikvansebille Apr 1, 2026

Uh oh!

VeckoTheGecko Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

VeckoTheGecko commented Mar 27, 2026

Download all datasets before running benchmarks

Footnotes

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

fluidnumerics-joe commented Mar 31, 2026

Uh oh!

VeckoTheGecko commented Mar 31, 2026

Uh oh!

VeckoTheGecko commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

erikvansebille Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

VeckoTheGecko Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants