Switch to custom netcdf4/hdf5 backend #395

jsignell · 2025-01-28T22:32:53Z

Switches autodetected backend selection
updates tests to require kerchunk less often
only test kerchunk hdf reader if kerchunk is available

Closes Switch tests to use HDF reader instead of kerchunk-based HDF5 reader #374
Tests added
Tests passing
Changes are documented in docs/releases.rst

* Switches autodetected backend selection * updates tests to require kerchunk less often * only test kerchunk hdf reader if kerchunk is available

codecov · 2025-01-28T22:33:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.23%. Comparing base (443928f) to head (529a5b5).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #395      +/-   ##
==========================================
+ Coverage   77.75%   84.23%   +6.47%     
==========================================
  Files          31       31              
  Lines        1821     1821              
==========================================
+ Hits         1416     1534     +118     
+ Misses        405      287     -118

Files with missing lines	Coverage Δ
virtualizarr/backend.py	`95.65% <ø> (-1.45%)`	⬇️

... and 16 files with indirect coverage changes

jsignell · 2025-01-28T22:33:52Z

virtualizarr/tests/test_integration.py

These mostly still do not run in a kerchunk-free env. See #376 for that piece

jsignell · 2025-01-28T22:34:19Z

virtualizarr/tests/test_xarray.py


    vds = open_virtual_dataset(simple_netcdf4, loadable_variables=["foo"])
-    assert vds.virtualize.nbytes == 48
+    assert vds.virtualize.nbytes == 104


I guess I am not concerned that the nbytes is different now?

That seems weird... I would have expected them to be the same.

I think this has to do with coordinates being populated for dimensions that are supposed to be "without coordinates" see https://github.com/zarr-developers/VirtualiZarr/pull/395/files#r1934273183.

virtualizarr/tests/test_backend.py

virtualizarr/tests/__init__.py

jsignell · 2025-01-29T17:08:02Z

virtualizarr/tests/test_backend.py

+            netcdf4_file_with_data_in_multiple_groups,
+            group="subgroup",
+            indexes={},
+            backend=HDF5VirtualBackend,
        )


This is what the output looks like for the kerchunk-based reader:

<xarray.Dataset> Size: 16B Dimensions: (dim_0: 2) Dimensions without coordinates: dim_0 Data variables: bar (dim_0) int64 16B ManifestArray<shape=(2,), dtype=int64, chunks=...

vs the custom reader:

<xarray.Dataset> Size: 32B Dimensions: (dim_0: 2) Coordinates: dim_0 (dim_0) float64 16B 0.0 0.0 Data variables: bar (dim_0) int64 16B ManifestArray<shape=(2,), dtype=int64, chunks=...

for reference here is what it looks like if I just naively open it as a regular dataset:

(Pdb) xr.open_dataset(netcdf4_file_with_data_in_multiple_groups, group="subgroup") <xarray.Dataset> Size: 16B Dimensions: (dim_0: 2) Dimensions without coordinates: dim_0 Data variables: bar (dim_0) int64 16B ...

I think this is probably the source of the nbytes difference too

That seems like a bug in the HDF reader. If xarray doesn't think this dimension has a coordinate, then virtualizarr's HDF reader shouldn't create one either.

Opened #401 to track this (FYI @sharkinsspatial )

virtualizarr/tests/test_backend.py

jsignell · 2025-01-29T21:25:08Z

@TomNicholas @sharkinsspatial I pushed a commit (e18e647) to encode #401 in tests. I think this is good to merge now.

TomNicholas

Amazing @jsignell! Only one minor suggestion.

Also very good call on not being overzealous and removing kerchunk-dependent tests entirely.

TomNicholas · 2025-01-29T23:47:09Z

virtualizarr/tests/test_backend.py

        assert isinstance(vds["bar"].data, ManifestArray)
        assert vds["bar"].shape == (2,)

-    def test_open_root_group_manually(self, netcdf4_file_with_data_in_multiple_groups):
+    def test_open_root_group_manually(


Couldn't this test be combined with the one below by parameterizing group over ("", None)?

Done in 8706af5

…ts. (#410) * Do not create variables for non coordinate dimension hdf datasets. * Revert test changes to avoid HDFVirtualBackend errors from #395. * Re-enable xfailed roundtrip integration test. * Fix HDF5 type usage. * Fix indent error for scanning HDF5 items.

* Use open_dataset_kerchunk in roundtrip tests that don't otherwise require kerchunk * Make it clear that integration tests require zarr-python * Add in-memory icechunk tests to existing roundtrip tests * Playing around with icechunk / zarr / xarray upgrade * Passing icechunk tests * Update tests to latest kerchunk * Remove icechunk roundtripping * Fixed some warnings * Fixed codec test * Fix warnings in test_backend.py * Tests passing * Remove obsolete comment * Add fill value to fixture * Remove obsolete conditional to ds.close() * Reset workflows with --cov * Reset conftest.py fixtures (air encoding) * Reset contributiong (--cov) removed * Remove context manager from readers/common.py * Reset test_backend with ds.dims * Reset test_icechunk (air encoding) * Fix change that snuck in on #395 --------- Co-authored-by: Aimee Barciauskas <[email protected]>

Switch to custom netcdf4/hdf5 backend

529a5b5

* Switches autodetected backend selection * updates tests to require kerchunk less often * only test kerchunk hdf reader if kerchunk is available

jsignell temporarily deployed to test-release January 28, 2025 22:33 — with GitHub Actions Inactive

jsignell commented Jan 28, 2025

View reviewed changes

virtualizarr/tests/test_backend.py Show resolved Hide resolved

jsignell commented Jan 28, 2025

View reviewed changes

virtualizarr/tests/test_backend.py Outdated Show resolved Hide resolved

Allow for kerchunk-based backend

65d3f5a

jsignell temporarily deployed to test-release January 28, 2025 22:40 — with GitHub Actions Inactive

TomNicholas reviewed Jan 29, 2025

View reviewed changes

virtualizarr/tests/__init__.py Outdated Show resolved Hide resolved

TomNicholas added Kerchunk Relating to the kerchunk library / specification itself testing dependencies Updates a dependency labels Jan 29, 2025

maxrjones added the v3-migration Required for migration to Zarr-Python 3.0 label Jan 29, 2025

Rename to parametrize_over_hdf_backends

65ce636

jsignell commented Jan 29, 2025

View reviewed changes

jsignell added 2 commits January 29, 2025 12:22

Run group tests

8e2c87c

Respect dimensions without coordinates

689e0ce

jsignell temporarily deployed to test-release January 29, 2025 17:27 — with GitHub Actions Inactive

TomNicholas mentioned this pull request Jan 29, 2025

HDF reader creating spurious dimension coordinates #401

Closed

jsignell mentioned this pull request Jan 29, 2025

Group support in HDFVirtualBackend #402

Closed

jsignell commented Jan 29, 2025

View reviewed changes

virtualizarr/tests/test_backend.py Outdated Show resolved Hide resolved

Fix zarr-developers#402 so that nested groups are ignored

0c58f76

jsignell temporarily deployed to test-release January 29, 2025 20:13 — with GitHub Actions Inactive

Merge branch 'main' into hdfbackend

337d305

jsignell temporarily deployed to test-release January 29, 2025 20:14 — with GitHub Actions Inactive

Encode zarr-developers#401 behavior in tests

e18e647

jsignell temporarily deployed to test-release January 29, 2025 20:42 — with GitHub Actions Inactive

Fix min deps tests

02c1cbc

jsignell temporarily deployed to test-release January 29, 2025 21:08 — with GitHub Actions Inactive

Make mypy happy

e2789ca

jsignell temporarily deployed to test-release January 29, 2025 21:14 — with GitHub Actions Inactive

Add to release notes

8479401

jsignell temporarily deployed to test-release January 29, 2025 21:24 — with GitHub Actions Inactive

TomNicholas approved these changes Jan 29, 2025

View reviewed changes

combine two tests into one

8706af5

TomNicholas temporarily deployed to test-release January 30, 2025 03:31 — with GitHub Actions Inactive

TomNicholas merged commit 81a76f0 into zarr-developers:main Jan 30, 2025
11 checks passed

jsignell deleted the hdfbackend branch January 30, 2025 13:53

sharkinsspatial added a commit that referenced this pull request Jan 31, 2025

Revert test changes to avoid HDFVirtualBackend errors from #395.

1e916b5

jsignell added a commit to jsignell/VirtualiZarr that referenced this pull request Jan 31, 2025

Fix change that snuck in on zarr-developers#395

76dcfb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to custom netcdf4/hdf5 backend #395

Switch to custom netcdf4/hdf5 backend #395

jsignell commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading

jsignell Jan 28, 2025

jsignell Jan 28, 2025

TomNicholas Jan 29, 2025

jsignell Jan 29, 2025

jsignell Jan 29, 2025

TomNicholas Jan 29, 2025

jsignell commented Jan 29, 2025

TomNicholas left a comment

TomNicholas Jan 29, 2025

TomNicholas Jan 30, 2025

Switch to custom netcdf4/hdf5 backend #395

Switch to custom netcdf4/hdf5 backend #395

Conversation

jsignell commented Jan 28, 2025 • edited Loading

codecov bot commented Jan 28, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsignell commented Jan 29, 2025

TomNicholas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsignell commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading