add tensorstore as an optional backend #13

d-v-b · 2023-09-18T12:06:12Z

e.g., to_zarr(engine='tensorstore') vs to_zarr(engine='zarr_python')

The text was updated successfully, but these errors were encountered:

d-v-b · 2023-10-29T09:32:58Z

and zarrita too. This is the only way at the moment to properly test zarr v3. However, from an API perspective, i'm struggling to see how multiple backends could work here.

If we consider just the case of creating groups, at the moment we have this (ref):

to_zarr(spec: GroupSpec, store: BaseStore, path: str) -> zarr.Group)

Suppose we change this to

to_zarr(spec: GroupSpec, store: Union[BaseStore, tensorstore.StoreLike, zarrita.StoreLike], path: str) -> Union[zarr.Group, tensorstore.GroupLike, zarrita.Group])

Assuming that tensorstore and zarrita both provide a StoreLike class for creating zarr groups. First, unless i'm missing something really basic about python type hints, we can't type this function properly unless we make tensorstore and zarrita mandatory dependencies of the project, which isn't great.

Second, unless we define Group classes in this repo, the return type varies with the engine -- as far as I can tell, tensorstore does not have a representation of a Zarr group, only Zarr arrays, and so the only way to create zarr groups with that library is via zarrita or zarr-python, and to_zarr would thus return a hacked zarrita group or zarr-python group that delegates array IO to tensorstore. Not an attractive option.

zarrita can represent zarr groups, but with an API that is very different from groups in zarr-python, so even if we just have zarrita in the mix, to_zarr still returns very different things, depending on the engine. I don't really like this outcome.

alternatives

Find the lowest common denominator -- change the return type of to_zarr to str, which would either be the path to the object in storage (i.e., exactly the path argument to to_zarr, which is a little dumb) or a URL to the object in storage, which again is information that the caller of to_zarr already has access to. However, a str representation of a zarr array / group is something that is common to all engines.
leave GroupSpec.to_zarr as-is, and write zarrita- and tensorstore-specific routines as functions that are defined behind an import check, and later combine things into a nice API. I'm leaning toward this option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tensorstore as an optional backend #13

add tensorstore as an optional backend #13

d-v-b commented Sep 18, 2023

d-v-b commented Oct 29, 2023

add tensorstore as an optional backend #13

add tensorstore as an optional backend #13

Comments

d-v-b commented Sep 18, 2023

d-v-b commented Oct 29, 2023

alternatives