Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more extensive api x store testing #2447

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Oct 30, 2024

The api module contains a lot of our user-facing code, but we are essentially only testing it against LocalStore and MemoryStore. A better situation would be to test the functions in api against all the stores. That's what this PR adds. Note that there are many test failures. I will try to get them all fixed.

This PR also contains a fix for #2444, but the central addition is the (currently failing) test for that scenario.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 30, 2024

this became a bit of a saga.

S3 test fixtures

I moved the s3 test fixture to the main conftest.py so that any test can depend on a RemoteStore instance. I was getting strange errors from our s3 test fixture. they went away after I made the following changes:

  • set the region to us-east-1
  • set the moto port automatically instead of hard-coding it. this makes the moto server URL dynamic, so I wired up the test fixtures that need that url to consume the s3_base fixture.

store test fixture

I also refactored the store test fixture. It's now possible to request a store class and a store mode with the appropriately formatted string, e.g. @pytest.mark.parametrize('store', ['local_a', 'local_w'], indirect=True) will request a LocalStore with mode="a", and a LocalStore with mode="w". This test fixture was completely broken for RemoteStore, so I fixed that. I also removed a redundant store test fixture from the group tests, and I removed the individual memory_store, local_store, etc test fixtures.

we should default to testing all the stores

IMO our default behavior should be to test relevant APIs against as many stores as possible, at least until slow tests becomes an issue. That being said, I'm happy to narrow the scope of certain tests, e.g. consolidated metadata tests, to only run on MemoryStore if we think that's OK.

ZipStore problems

We have a lot of tests that involve creating an array or group with a store, then opening a new array or group with the same store but with a different mode. This is impossible today because ZipStore doesn't do with_mode, so tests where I couldn't hack around this limitation are xfailed (there are several in test_api).

open logic

I had test failures relating to some of the logic in zarr.api.asynchronous.open. The version of this function in main is very hard to understand, the version in this PR is marginally simpler. I think it's possible in main to run open(...shape=(10,)) and get back a group, which should not be possible (and is not possible after my changes).

Failing tests

Here's the currently failing tests. I think these are all real problems exposed by adding more tests, but I don't really have any idea how to solve them. Help would be appreciated here!

FAILED tests/test_group.py::test_group_update_attributes_async[zarr2-remote] - botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: Task <Task pending name='Task-147164' coro=<set_or_delete() running at /Users/runner/work/zarr-python/zarr-python/src/zarr/abc/store.py:425> cb=[gather.<locals>._done_callback() at /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/tasks.py:820]> got Future <Future pending cb=[BaseSelectorEventLoop._sock_write_done()()]> attached to a different loop
FAILED tests/test_group.py::test_group_update_attributes_async[zarr3-remote] - botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: Task <Task pending name='Task-147207' coro=<set_or_delete() running at /Users/runner/work/zarr-python/zarr-python/src/zarr/abc/store.py:425> cb=[gather.<locals>._done_callback() at /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/tasks.py:820]> got Future <Future pending cb=[BaseSelectorEventLoop._sock_write_done()()]> attached to a different loop
FAILED tests/test_group.py::test_members_name[zarr2-remote-True] - DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
FAILED tests/test_group.py::test_members_name[zarr2-remote-False] - DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
FAILED tests/test_group.py::test_members_name[zarr3-remote-True] - DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
FAILED tests/test_group.py::test_members_name[zarr3-remote-False] - DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 31, 2024

The remaining test falures are in the python 3.13 tests where botocore.auth uses datetime in a deprecated way, resulting in DeprecationWarning, which test_members_name treats as an error. See https://github.com/zarr-developers/zarr-python/actions/runs/11620549084/job/32362570100?pr=2447. Since this is purely a botocore x py3.13 issue, this test qua Zarr test should probably not fail. @jhamman any suggestions for changes to the test_members_name that would avoid this error?

@d-v-b d-v-b marked this pull request as ready for review November 5, 2024 22:25
Comment on lines +232 to +245
def test_open_store_with_mode_r(store: Store) -> None:
# 'r' means read only (must exist)
with pytest.raises(FileNotFoundError):
zarr.open(store=store, mode="r")

z1 = zarr.ones(store=store, shape=(3, 3))
assert z1.fill_value == 1

z2 = zarr.open(store=store, mode="r")
assert isinstance(z2, Array)
assert z2.fill_value == 1
assert (z2[:] == 1).all()
with pytest.raises(ValueError):
z2[:] = 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhamman this test is failing in this PR. What's the expected behavior here, given the new semantics of read_only for stores? Should we raise an error in zarr.open if the mode is r but the store is not read only?

Note the test below, which is copied from main, passes, but in that case zarr.open takes a path / string and thus has the opportunity to create the store in read only mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 26, 2024

getting the lasts tests to pass required adding two new methods to the stores: _as_mutable and _as_immutable, that return a non-read-only or a read-only copy of a store class, respectively. This is necessary because without some means of creating read-only copies of stores, there's no way for zarr.open(mutable_store, mode='r') to correctly produce a read-only store class.

I think this PR is pretty important insofar as it opens up the possibility of testing the store classes much more uniformly, it would be great to get this looked at soon!

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 26, 2024

A major downside to this PR is the additional time required for tests -- with these changes, tests in CI take nearly 20m to complete! On the other hand, I have found many bugs by testing extensively. Not sure how to find balance here.

@jhamman jhamman added the V3 label Nov 29, 2024
@dcherian
Copy link
Contributor

dcherian commented Dec 12, 2024

Not sure how to find balance here.

Hypothesis property tests! You can even configure it to run for longer on CI and shorter locally. You can also ensure specific cases always run

@@ -412,6 +412,18 @@ async def getsize_prefix(self, prefix: str) -> int:
sizes = await concurrent_map(keys, self.getsize, limit=limit)
return sum(sizes)

def _as_immutable(self: Self) -> Self:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in icechunk we have these as set_read_only and set_writeable which I mildly prefer. Simple words are nice.

Comment on lines +32 to +33
@pytest.mark.parametrize("store", ["local", "memory", "remote"], indirect=True)
def test_create_array(store: Store) -> None:
Copy link
Contributor

@dcherian dcherian Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize("store", ["local", "memory", "remote"], indirect=True)
def test_create_array(store: Store) -> None:
from hypothesis import given
import hypothesis.strategies as st
stores = st.sampled_from(["local", "memory", "remote"])
@given(store=stores())
def test_create_array(store: Store) -> None:

@dstansby dstansby removed the V3 label Dec 12, 2024
@@ -412,6 +412,18 @@ async def getsize_prefix(self, prefix: str) -> int:
sizes = await concurrent_map(keys, self.getsize, limit=limit)
return sum(sizes)

def _as_immutable(self: Self) -> Self:
"""
Return a mutable copy of the store.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like perhaps the words "mutable" and "immutable" should be swapped between this method and the _as_mutable method, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, good catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants