-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF reader creating spurious dimension coordinates #401
Comments
I'll need to verify but I think this is the issue captured by this xfailed test https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/tests/test_readers/test_hdf/test_hdf_integration.py#L30-L42 |
@sharkinsspatial to answer your question in the meeting just now the expected behaviour here should be to create whatever coordinate variable xarray create (though the creation of the index backing it is handled by a later layer, so it just needs to be in / not be in your returned list of Variable objects). |
* Switch to custom netcdf4/hdf5 backend * Switches autodetected backend selection * updates tests to require kerchunk less often * only test kerchunk hdf reader if kerchunk is available * Allow for kerchunk-based backend * Rename to parametrize_over_hdf_backends * Run group tests * Respect dimensions without coordinates * Fix #402 so that nested groups are ignored * Encode #401 behavior in tests * Fix min deps tests * Make mypy happy * Add to release notes * combine two tests into one --------- Co-authored-by: TomNicholas <[email protected]>
@TomNicholas I was struggling a bit to understand how I would be able to differentiate between the empty HDF5 datasets that are created for dimensions when serializing to netCDF vs "real" empty HDF5 datasets users would want represented as variables. As @keewis identified here #260, I'm sure there are likely other cases I am not covering, but I believe we can differentiate these empty HDF5 datasets for dimensions using logic similar to what I've outlined here https://nbviewer.org/gist/sharkinsspatial/a00afe480186bbd953d5265e560fa24e. I'm sure xarray has more comprehensive logic for this determination but I wasn't able to locate where in the stack this handled. I'll implement this now and get our tests passing, but it would be a good idea if we can get someone who understands the xarray logic better to review our approach as well. |
vs the custom reader:
for reference here is what it looks like if I just naively open it as a regular dataset:
I think this is probably the source of the
nbytes
difference tooOriginally posted by @jsignell in #395 (comment)
The text was updated successfully, but these errors were encountered: