Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tensorstore as an optional backend #13

Open
d-v-b opened this issue Sep 18, 2023 · 1 comment
Open

add tensorstore as an optional backend #13

d-v-b opened this issue Sep 18, 2023 · 1 comment

Comments

@d-v-b
Copy link
Collaborator

d-v-b commented Sep 18, 2023

e.g., to_zarr(engine='tensorstore') vs to_zarr(engine='zarr_python')

@d-v-b
Copy link
Collaborator Author

d-v-b commented Oct 29, 2023

and zarrita too. This is the only way at the moment to properly test zarr v3. However, from an API perspective, i'm struggling to see how multiple backends could work here.

If we consider just the case of creating groups, at the moment we have this (ref):

to_zarr(spec: GroupSpec, store: BaseStore, path: str) -> zarr.Group)

Suppose we change this to

to_zarr(spec: GroupSpec, store: Union[BaseStore, tensorstore.StoreLike, zarrita.StoreLike], path: str) -> Union[zarr.Group, tensorstore.GroupLike, zarrita.Group])

Assuming that tensorstore and zarrita both provide a StoreLike class for creating zarr groups. First, unless i'm missing something really basic about python type hints, we can't type this function properly unless we make tensorstore and zarrita mandatory dependencies of the project, which isn't great.

Second, unless we define Group classes in this repo, the return type varies with the engine -- as far as I can tell, tensorstore does not have a representation of a Zarr group, only Zarr arrays, and so the only way to create zarr groups with that library is via zarrita or zarr-python, and to_zarr would thus return a hacked zarrita group or zarr-python group that delegates array IO to tensorstore. Not an attractive option.

zarrita can represent zarr groups, but with an API that is very different from groups in zarr-python, so even if we just have zarrita in the mix, to_zarr still returns very different things, depending on the engine. I don't really like this outcome.

alternatives

  • Find the lowest common denominator -- change the return type of to_zarr to str, which would either be the path to the object in storage (i.e., exactly the path argument to to_zarr, which is a little dumb) or a URL to the object in storage, which again is information that the caller of to_zarr already has access to. However, a str representation of a zarr array / group is something that is common to all engines.
  • leave GroupSpec.to_zarr as-is, and write zarrita- and tensorstore-specific routines as functions that are defined behind an import check, and later combine things into a nice API. I'm leaning toward this option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant