-
Hi, zarr is just amazing. I'm about to rewrite some of my ML processing pipelines to use zarr for its distributed nature. I would like to ask for an advice for my specific usecase. I've read the multiprocessing documentation which allows for using process syncronizer. synchronizer = zarr.ProcessSynchronizer('data/example.sync')
z = zarr.open_array('data/example', mode='w', shape=(10000, 10000),
chunks=(1000, 1000), dtype='i4',
synchronizer=synchronizer)
z But this is a superset of what I'm trying to achieve. output layout is fixed for me:
What I want is to open the file from multiple processes and work on each partition independently
Is there a more efficient way to open/create a file? Pseudocode API synchronizer = zarr.DistributedSynchronizer(world_size=N, rank=k)
z = zarr.open_array('data/example', mode='w', shape=(10000, 10000),
chunks=(1000, 1000), dtype='i4',
synchronizer=synchronizer)
z A more lowlevel workaround would also work for me |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I found it in the docs
|
Beta Was this translation helpful? Give feedback.
I found it in the docs