Use `obstore` in data processing workflows by spencerkclark · Pull Request #988 · ai2cm/ace

spencerkclark · 2026-03-19T15:33:37Z

This follows the approach we have taken in the ERA5 data processing workflow to use obstore when reading/writing from/to zarr in the cloud. It is expected this should provide a meaningful performance improvement, particularly when reading from stores with small inner chunks, namely the stats and time coarsening parts of our workflow.

Test workflows on the same 10-year dataset:

Without obstore: compute-fme-dataset-ensemble-fzl9k
With obstore: compute-fme-dataset-ensemble-cjp5g

Timing results:

	Dataset computation	Stats computation
Without `obstore`	48m	34m
With `obstore`	38m	35m

Surprisingly we do not see any meaningful change in the stats computation time, though we do see a faster dataset computation step.

Changes:

Adds a get_zarr_store function in compute_dataset.py and get_stats.py and uses it in compute_stats.py, get_stats.py, and time_coarsen.py.
Updates the processing image to include obstore. The new image is tagged v2026.03.0 and we have updated the argo workflow to use the image in all steps.

Use obstore in data processing workflows

237c6e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `obstore` in data processing workflows#988

Use `obstore` in data processing workflows#988
spencerkclark wants to merge 1 commit intomainfrom
feature/obstore-in-data-processing

spencerkclark commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spencerkclark commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

spencerkclark commented Mar 19, 2026 •

edited

Loading