Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_array w/ overwrite=True doesn't clean up an existing array #2719

Open
d-v-b opened this issue Jan 16, 2025 · 1 comment
Open

create_array w/ overwrite=True doesn't clean up an existing array #2719

d-v-b opened this issue Jan 16, 2025 · 1 comment
Labels
bug Potential issues with the zarr-python library

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Jan 16, 2025

MRE

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "zarr == 3.0.0",
# ]
# ///

from zarr import create_array
import numpy as np
shapes = (2,1), (2,)
for shape in shapes:
    data = np.arange(np.prod(shape) , dtype='uint8').reshape(shape)
    arr = create_array(
        'data/v2_array_example.zarr',
        shape=data.shape,
        dtype=data.dtype,
        zarr_format=2,
        compressors='auto',
        overwrite=True,
        chunk_key_encoding={'name': 'v2', 'configuration': {'separator': '/'}},
        )

    arr[:] = data # fill the array with values
    print(arr.info_complete())

Traceback (the latter bit):

...
  File "/Users/bennettd/Library/Caches/uv/archive-v0/Dt_Vb9Xzyh8CGtlKxZyUO/lib/python3.11/site-packages/zarr/storage/_local.py", line 62, in _put
    with path.open(mode=mode) as f:
         ^^^^^^^^^^^^^^^^^^^^
  File "/Users/bennettd/.pyenv/versions/3.11.9/lib/python3.11/pathlib.py", line 1044, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: 'data/v2_array_example.zarr/0'

This is a simple bug, but it also reveals the meaning of "overwrite" is not clear.

  • I would expect creating an array, then creating an array at the same path withoverwrite=True to completely remove the pre-existing array.
  • I'm not sure I would expect overwrite=True to erase absolutely everything prefixed underneath the array. So if I had saved cat.jpgin the same directory as my array, I might not expect that to get erased.

It might be reasonable to define "create_x(overwrite=True)" to mean "ensure that all keys necessary to create object x are available, but no other keys will be touched". But issuing a huge volume of delete_object requests against metered cloud storage could be much more expensive than a simple "delete_prefix". I don't know much about this, someone good at s3 billing should help here.

Curious to hear what other people think about what "overwrite" should mean here.

@jhamman
Copy link
Member

jhamman commented Jan 16, 2025

I feel that we should be clearing chunks whenever overwrite is used. Calling Store.delete_dir whenever overwite is called seems like the prudent action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

2 participants