Replies: 2 comments 5 replies
-
It's important to distinguish between zarr the format and As for We are trying to make indexing faster in |
Beta Was this translation helpful? Give feedback.
-
I think the tensorstore tutorial is a good place to start. You would probably not want to use the format in that example (n5); instead, you can use tensorstore to read and write zarr v2 and v3 arrays (to write zarr groups you can just use |
Beta Was this translation helpful? Give feedback.
-
I'm looking for ways to improve the performance of my dataloading pipeline and I found Zarr. To get an idea about throughput, I started a small benchmark script in python. To get a baseline I also run tests using numpy memory mapped arrays.
I'm working with 4D arrays which are quite large. One of my criterias is that I need to access them as a key-value store. From each value, I access randomly on the first axis.
I created some dummy arrays to test throughput.
Here is my complete benchmarking code that compares Zarr to accessing raw Numpy arrays on disk:
It turns out that accessing Numpy arrays outperforms Zarr by a factor of ~6-7
My maximum disk speed is 500MB/s and I reach roughly 400MB/s using numpy. With Zarr I see a throughput of ~50-60MB/s
This difference is so big that I feel like I must be missing something. I tried different chunk sizes and disabled compression completely. Still, Zarr never reaches a throughput that comes even close to Numpy's memory mapped arrays.
Does anyone have a hint on what I'm missing? Is Zarr generally slow for my usecase of accessing large 4D arrays?
Appreciate any help
Beta Was this translation helpful? Give feedback.
All reactions