Skip to content

Commit

Permalink
Implement Zarr V3 protocol (#898)
Browse files Browse the repository at this point in the history
* add v3 store classes

Define the StoreV3 class and create v3 versions of most existing stores

Add a test_storage_v3.py with test classes inheriting from their v2
counterparts. Only a subset of methods involving differences in v3
behavior were overridden.

* add TODO comment to meta.py

* fix flake8 errors

* follow zarr v3 spec when dealing with extension data types

* fixes to v3 dtype handling

* flake8 cleanup

* remove duplicate lines in Metadata2.encode_array_metadata

* Fix fields in array metadata

zarr_version should not be in the array metadata, only the base store metadata

compressor should be absent when there is no compression

* Fix encode/decode of codec metadata

classmethods adapted from zarrita code

* add missing level to Zlib in _decode_codec_metadata

* add extensions entry to v3 array metadata

* dimension_separator should not be in the array metadata for v3

* update Attributes, adding StoreV3 support

avoid pytest error about missing fixture

fix flake8 error related to zarr_version fixture

* add StoreV3 support to core Array object

* update hexdigests

* handle additional codecs that were not implemented in zarrita

update hexdigests

* fix

* fix hexdigests

* fix indentation

* add StoreV3 support to Group, open_group, etc.

* add StoreV3 support to creation routines

* Handle dimension_separator appropriately in open_array

Specifically, we want to be able to infer the dimension_separator from the store if possible

* TST: add tests for open_array and dimension_separator

* only allow Codec not a simple str as compressor during array initialization

* add StoreV3 support to most convenience routines

consolidated metadata functions haven't been updated yet

* set convenience routines default to zarr_version=None

This will infer the version from the store if it is a BaseStore. Otherwise it will use
2 for backwards compatibility

* adjust test have dimension_separator key was removed from v3 metadata

* add underscores to imported test classes in test_storage_v3.py

avoids these tests running a second time when this file is called

* add underscore to imported TestArrayWithPath in test_core_v3.py

avoids this test class from being run a second time

* refactore _valid_keys and add tests

test _ensure_store(None)

* move KVStoreV3 logic from StoreV3.__eq__ to KVStoreV3.__eq__

* expand tests for _ensure_store

* test exception for v2 store input to _get_hierarchy_metadata

* test exception for init_array with path=None

* remove unneeded checks from Attributes

The store can reject invalid v3 keys.
_update_nosync calls _get_nosync which will add the 'attributes' key if missing

* tests __repr__ of LazyLoader

* test load of individual array

* Add simple test case for zarr.tree convenience method

* add tests for copy_store with a V3 store class

* test raising of exception on intialization with mismatched store and chunk_store protocol versions

* add key validation on setitem in v3 stores

enable missing test_hierarchy for v3 stores. This required fixes to
a number of the rename and rmdir methods for the V3 stores

* Fix core V3 tests now that keys are validated on __setitem__

* pep8 in storage_v3 tests

* flake8 in test_convenience.py

* pep8

* fix test_attrs.py

validate_key requires attr key to start with meta/ or data/ in v3

* Fix SQLLiteStore

changes to rmdir were intended for SQLLiteStoreV3 not SQLLiteStore

* fix failing hierarchy test

* update ZipStore tests to make sure they all run on V3

* add default rmdir implementation to all StoreV3 classes

without these can be overridden by the other V2 class in the MRO

* fix test_sync.py

* all rmdir methods for StoreV3 classes need to remove associated metadata

* avoid warning from test_entropy.py

* pep8 fixes

* greatly reduce code duplication in test_storage_v3.py

instead add v3 code path to existing test methods in test_storage.py

* remove redundant test_hexdigest methods

only need to defined expected() for each class

reduce redundant code in test_core_v3.py

* move test_core_v3.py functions back into test_core.py

* typing fixes for mypy

* can assume self.keys() exists since BaseStore inherits from MutableMapping

* refactor rmdir methods for v3 and improve coverage

* improve coverage of core.py

* improve coverage of convenience.py

* expend info tests

needed to also test with a size > 10**12 to improve coverage

* Expand tests of Array.view

* improve coverage of creation.py

* improve coverage of hierarchy.py

* improve coverage of meta.py

* pep8

* skip FSStoreV3 test when fsspec not installed

* test raising of PermissionError for setter on views

* remove redundant check (_normalize_store_arg will already raise here)

* improve coverage and fix bugs in normalize_store_arg

* improve coverage of storage.py

remove redundant getsize methods

* pep8

* fix StoreV3 tests

* fix duplicate zarr_fsstore entry

* fix rename

* remove debug statements

* fix typo

* skip unavailable NumPy dtypes

* pep8

* mypy fixes

* remove redundant check (already done above)

* remove KeyError check. list_prefix only returns keys that exist

* coverage fixes

* implemented ConsolidatedMetadataStoreV3

Parametrize test_consolidate_metadata:
removes the need for a separate test_consolidated_with_chunk_store

* expand ConsolidatedMetadataStoreV3 tests

update _ensure_store to disallow mismatched Store versions

* remove debug statement

* fix tests: restore clobber=True

* test error path in consolidate_metadata

* add pragma: no cover for lines in test_meta.py that will only be visited on some architectures

* flake8 fixes

* flake8

* ENH: add ABSStoreV3

* flake8

* fix ABSStore.rmdir test coverage

* always use / in path

* remove remaining use of clobber argument in new tests

* remove NestedDirectoryStoreV3

No need for this class as DirectoryStoreV3 with / chunk separator can be used instead

* flake8

* remove rmdir_abs: rmdir method of ABSStore parent class in ABSStoreV3

* define meta_root and data_root variables

These define the root path for metadata and data, respectively

* move _valid_key_characters to be a StoreV3 class field

* make _get_hierarchy_metadata strictly require 'zarr.json'

Still use a default set of metadata in __init__ method of Group or Array classes.
Add a _get_metadata_suffix helper that defaults to '.json' if metadata is not present.

* ignore type checks for _get_metadata_suffix

* remove unneeded if/else in Array and Hierarchy class __init__

default metadata already gets added by
Metadata3.encode_hierarchy_metadata when meta=None

* remove unused import

* define DEFAULT_ZARR_VERSION so we can later more easily change from 2 to 3

* add test_get_hierarchy_metadata to test the v3 _get_hierarchy_metadata helper
  • Loading branch information
grlee77 authored Mar 23, 2022
1 parent 3f8a309 commit 2c13b95
Show file tree
Hide file tree
Showing 22 changed files with 5,383 additions and 1,359 deletions.
56 changes: 55 additions & 1 deletion zarr/_storage/absstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import warnings
from numcodecs.compat import ensure_bytes
from zarr.util import normalize_storage_path
from zarr._storage.store import Store
from zarr._storage.store import _get_metadata_suffix, data_root, meta_root, Store, StoreV3

__doctest_requires__ = {
('ABSStore', 'ABSStore.*'): ['azure.storage.blob'],
Expand Down Expand Up @@ -209,3 +209,57 @@ def getsize(self, path=None):

def clear(self):
self.rmdir()


class ABSStoreV3(ABSStore, StoreV3):

def list(self):
return list(self.keys())

def __eq__(self, other):
return (
isinstance(other, ABSStoreV3) and
self.client == other.client and
self.prefix == other.prefix
)

def __setitem__(self, key, value):
self._validate_key(key)
super().__setitem__(key, value)

def rmdir(self, path=None):

if not path:
# Currently allowing clear to delete everything as in v2

# If we disallow an empty path then we will need to modify
# TestABSStoreV3 to have the create_store method use a prefix.
ABSStore.rmdir(self, '')
return

meta_dir = meta_root + path
meta_dir = meta_dir.rstrip('/')
ABSStore.rmdir(self, meta_dir)

# remove data folder
data_dir = data_root + path
data_dir = data_dir.rstrip('/')
ABSStore.rmdir(self, data_dir)

# remove metadata files
sfx = _get_metadata_suffix(self)
array_meta_file = meta_dir + '.array' + sfx
if array_meta_file in self:
del self[array_meta_file]
group_meta_file = meta_dir + '.group' + sfx
if group_meta_file in self:
del self[group_meta_file]

# TODO: adapt the v2 getsize method to work for v3
# For now, calling the generic keys-based _getsize
def getsize(self, path=None):
from zarr.storage import _getsize # avoid circular import
return _getsize(self, path)


ABSStoreV3.__doc__ = ABSStore.__doc__
Loading

0 comments on commit 2c13b95

Please sign in to comment.