-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test ManifestStore against S3 and fix key parsing #507
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool, but I'm not totally sure I understand what you're intending to test here, or how it relates to the ManifestStore
work. Surely dealing with remote object storage is the responsibility of obstore
and icechunk
?
virtualizarr/tests/test_remote_integration/test_remote_icechunk.py
Outdated
Show resolved
Hide resolved
virtualizarr/tests/test_remote_integration/test_remote_icechunk.py
Outdated
Show resolved
Hide resolved
I would like to test that I built this as a separate PR under the assumption that PR scope should be as constrained as possible and that there were some Icechunk tests that would benefit from minio., but didn't pick a good test to start from. I'll modify this PR to test VirtualiZarr/virtualizarr/tests/test_backend.py Lines 206 to 232 in 206277f
|
Thanks for the explanation, that makes a lot more sense. I just want to avoid the rabbit hole of adding tests to VirtualiZarr that are really just testing that Icechunk does what it says it does.
FWIW I would be happy to merge your |
Oh wow, that'd make this so much easier! |
parsed_key = urlparse(request_key) | ||
return StoreRequest(store=stores[key], key=parsed_key.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only non-test related change in the PR, which makes the ManifestStore work for both local and object stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This does seem useful.
|
||
|
||
@pytest.fixture() | ||
@requires_obstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this test not require obstore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These markers don't make any impact on fixtures, so I removed it. The fixture is generated if any tests using the fixtures are captured by pytest, so the markers should all go on tests rather than fixtures.
client_options={"allow_http": True}, | ||
) | ||
filepath = "data.tmp" | ||
obs.put( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only did one put, but now there are 4 chunks in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a docstring explaining the approach in 874c6df
(#507). This is kinda similar to libraries that concurrently compress each chunk (or not for uncompressed data), concatenate them, and then write then to a file in one go. At least, I think that's a thing.
@pytest.mark.asyncio | ||
@pytest.mark.parametrize( | ||
"manifest_store", | ||
["local_store", pytest.param("s3_store", marks=pytest.mark.minio)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the mark be requires_minio
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're synonymous, but I will re-add requires_minio since it's more descriptive
"path": f"s3://{minio_bucket['bucket']}/{filepath}", | ||
"offset": 12, | ||
"length": 4, | ||
}, | ||
} | ||
manifest = ChunkManifest(entries=chunk_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to split this fixture up a bit, to more clearly delineate the creation of data in a (minio) s3-like obstore from the creation of the ManifestStore
object which contains the byte references pointing to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, done in 874c6df
(#507)
This PR test the ManifestStore against S3 using MinIO and fixes a small issue with translating ManifestArray keys to the appropriate key for
obstore.get()
.docs/releases.rst
api.rst