Performance Using partial_decompress=True
#1138
Unanswered
ryanhausen
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi I am experimenting with zarr to see if it would be a good fit for storing dataset of approximately 48 TB. We currently store the data in shards of (512, 512, 512) and containing (8, 8, 8) chunks using custom software for quick read access. zarr seems to read the whole chunk in before accessing the data, which for our larger chunks is a bit time consuming. I tried using the
partial_decompress=True
flag, but didn't see any noticeable improvement in access times. I think I am using it right, but could be wrong. Below is a toy example of how I am using zarr. Am I using it correctly? Should I expect a speed improvement using partial_decompress?Using smaller chunks like (64, 64, 64) works much more quickly, but will create a large number of files. I saw that sharding is currently being worked on in #877, #1111, and zarr-developers/zarr-specs#152 and looks really promising for our use case.
Thanks for your help!
Beta Was this translation helpful? Give feedback.
All reactions