Big data support #1281
Replies: 4 comments
-
Thanks for sharing. This sounds like a perfect application for Zarr.
The metadata size is independent of the array size, as you can see from the spec. Arbitrarily large arrays can be stored in Zarr. This is a fundamental goal of the project.
Zarr has no concept of coordinates. Just groups and arrays. Perhaps you're thinking of Xarray?
Can you clarify what you mean by "query"? Zarr-python supports accessing arrays via numpy-style indexing as described in the docs. The speed at which data are returned will likely depend entirely on your storage system.
This has me worried. There are very few storage media that are happy with so many files / objects. How do you plan to store you data. More details would be helpful. What are the explicit lat, lon, time dimensions and chunk sizes you have in mind? |
Beta Was this translation helpful? Give feedback.
-
Thanks, for your answer. I probably was not clear in my questions as I mixed concepts from zarr and from xarray.
|
Beta Was this translation helpful? Give feedback.
-
Yes. If you have a 3D Zarr Array and use numpy indexing to retrieve a value by position, e.g.
I don't understand what you mean by "map coordinates". Can you clarify? Why do you think xarray will have to read 3 million records? Can you say more about the access pattern you have in mind? |
Beta Was this translation helpful? Give feedback.
-
If your data are on an irregular grid (or for any other reason you need to look up values by something other than the index in the array), you'll need to use xarray, which can read from a zarr array with particular metadata, although IIRC it stores its coordinate indices in the zarr metadata i.e. JSON, so depending on how often it has to deserialise 24MB of coordinates, there might be some issues there). If your data aren't on a grid at all, I don't think zarr or xarray can help you. |
Beta Was this translation helpful? Give feedback.
-
Hello,
We plan to generate a super large data array (lat, lon, time) from the 56.000 Sentinel-2 tiles (per year). The longitude variable would hold about 3 million coordinates and about 2 billion chunks.
Thank you very much for your support.
Beta Was this translation helpful? Give feedback.
All reactions