Skip to content

Zarr file over HTTP fetches several times the same time axis variable #10560

@raphaeljolivet

Description

@raphaeljolivet

What happened?

When loading a Dataset from Zarr over HTTP the time chunk is fetched 3 times from the remote location.

Example Zarr dataset (on google bucket) :
https://storage.googleapis.com/test_zarr_oie/IEA_PVPS.zarr/ADR/zarr.json

What did you expect to happen?

I expect the following URL to be fetched once only :
https://storage.googleapis.com/test_zarr_oie/IEA_PVPS.zarr/ADR/time/c/0

Instead it is fetched 3 times (logs from mitmproxy) :

16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/.zattrs                                                       404 application/xml  201b 358ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/.zgroup                                                       404 application/xml  201b 384ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/.zmetadata                                                    404 application/xml  204b 346ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/zarr.json                                                     200 …plication/json 40.5k 204ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/time/c/0                                                      200 …n/octet-stream ….04k  26ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/time/c/0                                                      200 …n/octet-stream ….04k  24ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/station_name/c                                                200 …n/octet-stream   27b  31ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/time/c/0                                                200 …n/octet-stream ….04k  25ms 

Minimal Complete Verifiable Example

# To spy the request, you may setup a local proxy first : 
# `mitmproxy --mode reverse:https://storage.googleapis.com/ --listen-port 1111`

# Initial URL
# URL="https://storage.googleapis.com/test_zarr_oie/IEA_PVPS.zarr/ADR"

# Proxy URL 
URL="http://localhost:1111/test_zarr_oie/IEA_PVPS.zarr/ADR"

ds = xr.open_dataset(URL, engine="zarr")
print(ds)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/.zattrs                                                       404 application/xml  201b 358ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/.zgroup                                                       404 application/xml  201b 384ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/.zmetadata                                                    404 application/xml  204b 346ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/zarr.json                                                     200plication/json 40.5k 204ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/time/c/0                                                      200n/octet-stream.04k  26ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/time/c/0                                                      200n/octet-stream.04k  24ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/station_name/c                                                200n/octet-stream   27b  31ms 
16:28:07 HTTPS GET    storage.googleapis.com /test_zarr_oie/IEA_PVPS.zarr/ADR/time/c/0

Anything else we need to know?

No response

Environment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions