Skip to content

Commit 17c74e1

Browse files
VeckoTheGeckoerikvansebillepre-commit-ci[bot]
authored
Streamline parcels-benchmarks (#42)
Co-authored-by: Erik van Sebille <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 92cbd4f commit 17c74e1

File tree

19 files changed

+6576
-1125
lines changed

19 files changed

+6576
-1125
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ credentials.json
1010
*.egg-info
1111
__pycache__
1212
build/
13-
parcels/
1413
.asv/
1514
html/
15+
.DS_Store
16+
17+
data
18+
.env

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "Parcels"]
2+
path = Parcels
3+
url = [email protected]:Parcels-code/Parcels

Parcels

Submodule Parcels added at c6f11dc

README.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,30 @@ This repository houses performance benchmarks for [Parcels](https://github.com/O
66

77
## Development instructions
88

9+
This project uses a combination of [Pixi](https://pixi.sh/dev/installation/), [ASV](https://asv.readthedocs.io/), and [intake-xarray](https://github.com/intake/intake-xarray) to coordinate the setting up and running of benchmarks.
10+
11+
- Scripts are used to download the datasets required into the correct location
12+
- intake-xarray is used to define data catalogues which can be easily accessed from within benchmark scripts
13+
- ASV is used to run the benchmarks (see the [Writing the benchmarks](#writing-the-benchmarks) section).
14+
- Pixi is used to orchestrate all the above into a convenient, user friendly workflow
15+
16+
You can run `pixi task list` to see the list of available tasks in the workspace.
17+
18+
In brief, you can set up the data and run the benchmarks by doing:
19+
920
- [install Pixi](https://pixi.sh/dev/installation/) `curl -fsSL https://pixi.sh/install.sh | bash`
1021
- `pixi install`
11-
- `pixi run asv run`
22+
- `PARCELS_BENCHMARKS_DATA_FOLDER=./data pixi run benchmarks`
1223

13-
You can run the linting with `pixi run lint`
24+
> [!NOTE]
25+
> The syntax `PARCELS_BENCHMARKS_DATA_FOLDER=./data pixi run ...` set's the environment variable for the task, but you can set environment variables [in other ways](https://askubuntu.com/a/58828) as well.
1426
1527
> [!IMPORTANT]
16-
> The default path for the benchmark data is set by [pooch.os_cache](https://www.fatiando.org/pooch/latest/api/generated/pooch.os_cache.html), which typically is a subdirectory of your home directory. Currently, you will need at least 50GB of disk space available to store the benchmark data.
17-
> To change the location of the benchmark data cache, you can set the environment variable `PARCELS_DATADIR` to a preferred location to store the benchmark data.
28+
> Currently, you will need at least 50GB of disk space available to store the unzipped benchmark data. Since the zips are deleted after downloaded and extracted, this ends up being about 80GB of disk space needed.
29+
> You need to be explicit to determine where the benchmark data will be saved by
30+
> setting the `PARCELS_BENCHMARKS_DATA_FOLDER` environment variable. This
31+
> environment variable is used in the downloading of the data and definition of
32+
> the benchmarks.
1833
1934
To view the benchmark data
2035

@@ -34,7 +49,7 @@ Members of the Parcels community can contribute benchmark data using the followi
3449
2. Clone your fork onto your system
3550

3651
```
37-
git clone [email protected]:<your-github-handle>/parcels-benchmarks.git ~/parcels-benchmarks
52+
git clone --recurse-submodules [email protected]:<your-github-handle>/parcels-benchmarks.git
3853
```
3954

4055
3. Run the benchmarks
@@ -61,13 +76,9 @@ Adding benchmarks for parcels typically involves adding a dataset and defining t
6176
### Adding new data
6277

6378
Data is hosted remotely on a SurfDrive managed by the Parcels developers. You will need to open an issue on this repository to start the process of getting your data hosted in the shared SurfDrive.
64-
Once your data is hosted in the shared SurfDrive, you can easily add your dataset to the benchmark dataset manifest using
65-
66-
```
67-
pixi run benchmark-setup pixi add-dataset --name "Name for your dataset" --file "Path to ZIP archive in the SurfDrive"
68-
```
79+
Once your data is hosted in the shared SurfDrive, you can easily add your dataset to the benchmark dataset catalogue by modifying `catalogs/parcels-benchmarks/catalog.yml`.
6980

70-
During this process, the dataset will be downloaded and a complete entry will be added to the [parcels_benchmarks/benchmarks.json](./parcels_benchmarks/benchmarks.json) manifest file. Once updated, this file can be committed to this repository and contributed via a pull request.
81+
In the benchmark you can now use this catalogue entry.
7182

7283
### Writing the benchmarks
7384

asv.conf.json

Lines changed: 0 additions & 25 deletions
This file was deleted.

asv.conf.jsonc

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"version": 1,
3+
"project": "parcels",
4+
"project_url": "https://github.com/Parcels-Code/parcels",
5+
"repo": "./Parcels",
6+
"dvcs": "git",
7+
"branches": ["main"],
8+
"environment_type": "rattler",
9+
"conda_channels": [
10+
"conda-forge",
11+
"defaults",
12+
"https://repo.prefix.dev/parcels",
13+
],
14+
"default_benchmark_timeout": 1800,
15+
"env_dir": ".asv/env",
16+
"results_dir": "results",
17+
"html_dir": "html",
18+
"build_command": ["python -m build --wheel -o {build_cache_dir} {build_dir}"],
19+
// "install_command": [
20+
// "in-dir={conf_dir} python -m pip install .",
21+
// "in-dir={build_dir} python -m pip install ."
22+
// ],
23+
// "uninstall_command": [
24+
// "return-code=any python -m pip uninstall -y parcels parcels_benchmarks"
25+
// ]
26+
"matrix": {
27+
"req": {
28+
"intake-xarray": [],
29+
},
30+
},
31+
}

benchmarks/__init__.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import logging
2+
import os
3+
from pathlib import Path
4+
5+
logger = logging.getLogger(__name__)
6+
7+
PIXI_PROJECT_ROOT = os.environ.get("PIXI_PROJECT_ROOT")
8+
if PIXI_PROJECT_ROOT is not None:
9+
PIXI_PROJECT_ROOT = Path(PIXI_PROJECT_ROOT)
10+
11+
PIXI_PROJECT_ROOT: Path | None
12+
13+
try:
14+
PARCELS_BENCHMARKS_DATA_FOLDER = Path(os.environ["PARCELS_BENCHMARKS_DATA_FOLDER"])
15+
except KeyError:
16+
# Default to `./data`
17+
PARCELS_BENCHMARKS_DATA_FOLDER = Path("./data")
18+
logger.info("PARCELS_BENCHMARKS_DATA_FOLDER was not set. Defaulting to `./data`")
19+
20+
if not PARCELS_BENCHMARKS_DATA_FOLDER.is_absolute():
21+
if PIXI_PROJECT_ROOT is None:
22+
raise RuntimeError(
23+
"PARCELS_BENCHMARKS_DATA_FOLDER is a relative path, but PIXI_PROJECT_ROOT env variable is not set. We don't know where to store the data."
24+
)
25+
PARCELS_BENCHMARKS_DATA_FOLDER = PIXI_PROJECT_ROOT / str(
26+
PARCELS_BENCHMARKS_DATA_FOLDER
27+
)

benchmarks/catalogs.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
import intake
2+
3+
from . import PARCELS_BENCHMARKS_DATA_FOLDER
4+
5+
6+
class Catalogs:
7+
CAT_EXAMPLES = intake.open_catalog(
8+
f"{PARCELS_BENCHMARKS_DATA_FOLDER}/surf-data/parcels-examples/catalog.yml"
9+
)
10+
CAT_BENCHMARKS = intake.open_catalog(
11+
f"{PARCELS_BENCHMARKS_DATA_FOLDER}/surf-data/parcels-benchmarks/catalog.yml"
12+
)

benchmarks/fesom2.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import numpy as np
22
import uxarray as ux
3+
import xarray as xr
34
from parcels import (
45
FieldSet,
56
Particle,
@@ -8,39 +9,38 @@
89
)
910
from parcels.kernels import AdvectionRK2_3D
1011

11-
from parcels_benchmarks.benchmark_setup import PARCELS_DATADIR, download_example_dataset
12+
from . import PARCELS_BENCHMARKS_DATA_FOLDER
1213

1314
runtime = np.timedelta64(1, "D")
1415
dt = np.timedelta64(2400, "s")
1516

1617

17-
def _load_ds(datapath):
18+
def _load_ds():
1819
"""Helper function to load uxarray dataset from datapath"""
1920

20-
grid_file = f"{datapath}/mesh/fesom.mesh.diag.nc"
21-
data_files = f"{datapath}/*.nc"
22-
return ux.open_mfdataset(grid_file, data_files, combine="by_coords")
21+
grid_file = xr.open_mfdataset(
22+
f"{PARCELS_BENCHMARKS_DATA_FOLDER}/surf-data/parcels-benchmarks/data/Parcelsv4_Benchmarking_data/Parcels_Benchmarks_FESOM-baroclinic-gyre/data/mesh/fesom.mesh.diag.nc"
23+
)
24+
data_files = xr.open_mfdataset(
25+
f"{PARCELS_BENCHMARKS_DATA_FOLDER}/surf-data/parcels-benchmarks/data/Parcelsv4_Benchmarking_data/Parcels_Benchmarks_FESOM-baroclinic-gyre/data/*.nc"
26+
)
27+
28+
grid = ux.open_grid(grid_file)
29+
return ux.UxDataset(data_files, uxgrid=grid)
2330

2431

2532
class FESOM2:
2633
params = ([10000], [AdvectionRK2_3D])
2734
param_names = ["npart", "integrator"]
2835

29-
def setup(self, npart, integrator):
30-
# Ensure the dataset is downloaded in the desired data_home
31-
# and obtain the path to the dataset
32-
self.datapath = download_example_dataset(
33-
"FESOM-baroclinic-gyre", data_home=PARCELS_DATADIR
34-
)
35-
3636
def time_load_data(self, npart, integrator):
37-
ds = _load_ds(self.datapath)
37+
ds = _load_ds()
3838
for i in range(min(ds.coords["time"].size, 2)):
3939
_u = ds["u"].isel(time=i).compute()
4040
_v = ds["v"].isel(time=i).compute()
4141

4242
def pset_execute(self, npart, integrator):
43-
ds = _load_ds(self.datapath)
43+
ds = _load_ds()
4444
ds = convert.fesom_to_ugrid(ds)
4545
fieldset = FieldSet.from_ugrid_conventions(ds)
4646

benchmarks/moi_curvilinear.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,16 @@
66
import xgcm
77
from parcels.interpolators import XLinear
88

9-
from parcels_benchmarks.benchmark_setup import PARCELS_DATADIR, download_example_dataset
10-
119
runtime = np.timedelta64(2, "D")
1210
dt = np.timedelta64(15, "m")
1311

1412

13+
PARCELS_DATADIR = ... # TODO: Replace with intake
14+
15+
16+
def download_dataset(*args, **kwargs): ... # TODO: Replace with intake
17+
18+
1519
def _load_ds(datapath, chunk):
1620
"""Helper function to load xarray dataset from datapath with or without chunking"""
1721

@@ -72,9 +76,7 @@ class MOICurvilinear:
7276
]
7377

7478
def setup(self, interpolator, chunk, npart):
75-
self.datapath = download_example_dataset(
76-
"MOi-curvilinear", data_home=PARCELS_DATADIR
77-
)
79+
self.datapath = download_dataset("MOi-curvilinear", data_home=PARCELS_DATADIR)
7880

7981
def time_load_data_3d(self, interpolator, chunk, npart):
8082
"""Benchmark that times loading the 'U' and 'V' data arrays only for 3-D"""

0 commit comments

Comments
 (0)