Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FSStore.listdir behavior for nested directories #802

Merged

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Aug 2, 2021

This PR is to address the two test failures seen in #786. The fix is to remove any leading '/' from the key, so that after the replacement of '/' with . it becomes a valid chunk key.

The failures are not seen in the test suite with the currently pinned fsspec 2021.6.0, so I have also bumped the fsspec version to 2021.7.0 (which was showing the same failures as reported in #786 in local testing).

closes #786
closes #797 (specifying fsstore[s3] so that the separate s3fs dependency is not longer needed)

TODO:

  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@joshmoore
Copy link
Member

Note: I've updated the required status checks so the hanging build (3.8) and build (3.8, ==1.16.4) should no longer hang. I'm going to see if closing and re-opening will make them go away.

@grlee77
Copy link
Contributor Author

grlee77 commented Aug 3, 2021

Actually, I see one other test failure in test_storage.py that occurs with 2021.07.0, but not 2021.06.1 (so not due to the same commit).

=========================================================================================================== FAILURES ===========================================================================================================
___________________________________________________________________________________________________ TestFSStore.test_create ____________________________________________________________________________________________________

self = <zarr.tests.test_storage.TestFSStore object at 0x7fb2efd165b0>

    def test_create(self):
        import zarr
        path1 = tempfile.mkdtemp()
        path2 = tempfile.mkdtemp()
        g = zarr.open_group("file://" + path1, mode='w',
                            storage_options={"auto_mkdir": True})
        a = g.create_dataset("data", shape=(8,))
        a[:4] = [0, 1, 2, 3]
        assert "data" in os.listdir(path1)
        assert ".zgroup" in os.listdir(path1)
    
        g = zarr.open_group("simplecache::file://" + path1, mode='r',
                            storage_options={"cache_storage": path2,
                                             "same_names": True})
>       assert g.data[:].tolist() == [0, 1, 2, 3, 0, 0, 0, 0]

zarr/tests/test_storage.py:965: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
zarr/core.py:662: in __getitem__
    return self.get_basic_selection(selection, fields=fields)
zarr/core.py:787: in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out,
zarr/core.py:830: in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
zarr/core.py:1125: in _get_selection
    self._chunk_getitems(lchunk_coords, lchunk_selection, out, lout_selection,
zarr/core.py:1836: in _chunk_getitems
    cdatas = self.chunk_store.getitems(ckeys, on_error="omit")
zarr/storage.py:1085: in getitems
    results = self.map.getitems(keys_transformed, on_error="omit")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <fsspec.mapping.FSMap object at 0x7fb2efd56ca0>, keys = ['data/0'], on_error = 'omit'

    def getitems(self, keys, on_error="raise"):
        """Fetch multiple items from the store
    
        If the backend is async-able, this might proceed concurrently
    
        Parameters
        ----------
        keys: list(str)
            They keys to be fetched
        on_error : "raise", "omit", "return"
            If raise, an underlying exception will be raised (converted to KeyError
            if the type is in self.missing_exceptions); if omit, keys with exception
            will simply not be included in the output; if "return", all keys are
            included in the output, but the value will be bytes or an exception
            instance.
    
        Returns
        -------
        dict(key, bytes|exception)
        """
        keys2 = [self._key_to_str(k) for k in keys]
        oe = on_error if on_error == "raise" else "return"
        try:
            out = self.fs.cat(keys2, on_error=oe)
        except self.missing_exceptions as e:
            raise KeyError from e
        out = {
            k: (KeyError() if isinstance(v, self.missing_exceptions) else v)
>           for k, v in out.items()
        }
E       AttributeError: 'bytes' object has no attribute 'items'

This PR fixed the two failurs in test_core.py, but the CI here will apparently still fail due to that.

@martindurant
Copy link
Member

I believe that last one is fixed in main

@grlee77 grlee77 force-pushed the fix-fsstore-listdir-nested branch from e5b48ca to ac7ed24 Compare August 3, 2021 18:54
@grlee77 grlee77 force-pushed the fix-fsstore-listdir-nested branch from ac7ed24 to 407cd98 Compare August 3, 2021 19:05
@grlee77
Copy link
Contributor Author

grlee77 commented Aug 3, 2021

I rebased on master and switched to "/" as requested, but there is now the other error mentioned previously: https://github.com/zarr-developers/zarr-python/pull/802/checks?check_run_id=3234273246#step:7:139

@martindurant
Copy link
Member

Is this using fsspec main?

@grlee77
Copy link
Contributor Author

grlee77 commented Aug 3, 2021

No, I just saw that it was fixed in fsspec master. I had missunderstood your previous comment to be referring to zarr-python master.

The CI here is using 2021.7.0 so the failure is expected.

joshmoore added a commit to joshmoore/zarr-python that referenced this pull request Aug 17, 2021
@joshmoore
Copy link
Member

As a test, I tried 2021.7.0 on #773 without luck. (https://github.com/zarr-developers/zarr-python/pull/773/checks?check_run_id=3350238418)

@martindurant : any thoughts on a release for a 7.1 or should I pin to 6.*?

joshmoore added a commit to joshmoore/zarr-python that referenced this pull request Aug 18, 2021
joshmoore added a commit that referenced this pull request Aug 20, 2021
* Drop skip_if_nested_chunks from test_storage.py

* Add failing nested test

* Make DirectoryStore dimension_separator aware

* Migrate key logic to core rather than storage

Previous tests (now commented out) used logic in the store
classes to convert "0/0" keys into "0.0" keys, forcing the
store to be aware of array details. This tries to swap the
logic so that stores are responsible for passing dimension
separator values down to the arrays only. Since arrays can
also get the dimension_separator value from a .zarray file
they are now in charge.

* Fix linting in new test

* Extend the test suite for dim_sep

* Try fsspec 2021.7 (see #802)

* Revert "Try fsspec 2021.7 (see #802)"

This reverts commit 68adca5.

* Fix N5Store

* Re-activate contested N5 test
@bnavigator bnavigator mentioned this pull request Aug 20, 2021
@bnavigator
Copy link

For the record: test_core::*::test_array_2d still fail on 2.9.0 with fsspec-2021.7 and fsspec/filesystem_spec#710 applied to it.

@joshmoore
Copy link
Member

Thanks, @bnavigator. Unless someone has another suggestion, I'm going to start actively avoiding 2021.07.0 in the wild:

ome/ome-zarr-py@741e952

@bnavigator
Copy link

Oh, sorry, I should clarify that I referred to 2.9.0 without this PR. With this patch plus the patch from fsspec/filesystem_spec#710 applied, all (offline) tests succeed.

@joshmoore
Copy link
Member

@bnavigator : noted, but fsspec/filesystem_spec#710 is not yet released, right?

@bnavigator
Copy link

AFAICT, no. @martindurant any plans to do so soon?

@martindurant
Copy link
Member

Yes, just trying to fix up a couple of s3 things first. If I don't make progress in the next ~day, can release fsspec/gcsfs/s3fs anyway.

@codecov
Copy link

codecov bot commented Sep 1, 2021

Codecov Report

Merging #802 (d3fbcc9) into master (1a2eeed) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #802   +/-   ##
=======================================
  Coverage   99.94%   99.94%           
=======================================
  Files          31       31           
  Lines       10680    10681    +1     
=======================================
+ Hits        10674    10675    +1     
  Misses          6        6           
Impacted Files Coverage Δ
zarr/storage.py 100.00% <100.00%> (ø)

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting prep for 2.9.5.

Thanks, @martindurant and @grlee77!

@joshmoore joshmoore merged commit 4e98567 into zarr-developers:master Sep 1, 2021
joshmoore added a commit that referenced this pull request Sep 19, 2021
* Drop skip_if_nested_chunks from test_storage.py

* Add failing nested test

* Make DirectoryStore dimension_separator aware

* Migrate key logic to core rather than storage

Previous tests (now commented out) used logic in the store
classes to convert "0/0" keys into "0.0" keys, forcing the
store to be aware of array details. This tries to swap the
logic so that stores are responsible for passing dimension
separator values down to the arrays only. Since arrays can
also get the dimension_separator value from a .zarray file
they are now in charge.

* Fix linting in new test

* Extend the test suite for dim_sep

* add n5fsstore and tests

* slightly smarter kwarg interception

* remove outdated unittest ref and fix the name of a test func

* fix massive string block and fix default key_separator kwarg for FSStore

* flake8

* promote n5store to toplevel import and fix examples in docstring

* Try fsspec 2021.7 (see #802)

* Revert "Try fsspec 2021.7 (see #802)"

This reverts commit 68adca5.

* Add missing core tests for N5FSStore, and rchanges required for making them pass

* tmp: debug

* uncomment N5 chunk ordering test

* more commented tests get uncommented

* add dimension_separator to array metadata adaptor

* Revert "tmp: debug"

This reverts commit ee9cdbc.

* Attempt failed: keeping '.' and switching

* Revert "Attempt failed: keeping '.' and switching"

This reverts commit 51b3109.

* regex: attempt failed due to slight diff in files

* Revert "regex: attempt failed due to slight diff in files"

This reverts commit 3daea7c.

* N5: use "." internally for dimension separation

This allows N5 to detect the split between key and chunks
and pre-process them (re-ordering and changing the separator).

see: #773 #793

* move FSSpec import guard

* remove os.path.sep concatenation in listdir that was erroring a test, and add a mea culpa docstring about the dimension_separator for n5 stores

* resolve merge conflicts in favor of upstream

* make listdir implementation for n5fsstore look more like fsstore's listdir, and add crucial lstrip

* Update hexdigest tests for N5Stores to account for the presence of the dimension_separator keyword now present in metadata

* Add tests for dimension_separator in array meta for N5Stores

* N5FSStore: try to increase code coverage

 * Adds a test for the dimension_separator warning
 * uses the parent test_complex for listdir
 * "nocover" the import error since fsspec is ever present

* flake8

* add chunk nesting test to N5FSStore test suite

* make array_meta_key, group_meta_key, attrs_key private

* N5FSStore: Remove ImportError test

FSStore only throws ModuleNotFoundError on initialization
rather than on import. Therefore N5FSStore does the same.
If this *weren't* the case, then the import in zarr/init
would need to test the import as well, which isn't the case.

Co-authored-by: jmoore <[email protected]>
Co-authored-by: Josh Moore <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants