-
Notifications
You must be signed in to change notification settings - Fork 475
Fix gcs integration tests and use default multipart size #5765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Now that the flakiness is fixed by using random bucket name, we see that the tests fail in the multipart tests https://github.com/quickwit-oss/quickwit/actions/runs/14833830101. This is a regression that was introduced during the last Opendal updgrade #5748 |
// TODO: Uncomment storage_test_multi_part_upload when the XML API is | ||
// supported in the emulated GCS server | ||
// (https://github.com/fsouza/fake-gcs-server/pull/1164) | ||
|
||
// quickwit_storage::storage_test_multi_part_upload(&mut object_storage) | ||
// .await | ||
// .context("test multipart upload failed")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the latest version of opendal's GCS implementation now uses the XML API to support multipart uploads. Disabling the multipart test because the test GCS server does not support this API for now (fsouza/fake-gcs-server#1164)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For further ref, this is what the mock GCS server was receiving with the earlier opendal version:
time=2025-05-05T11:20:52.416Z level=INFO msg="172.16.7.1 - - [05/May/2025:11:20:52 +0000] \"POST /upload/storage/v1/b/sample-bucket/o?uploadType=media&name=integration-tests/test-azure-compatible-storage/hello_large.txt HTTP/1.1\" 200 617\n"
time=2025-05-05T11:20:52.417Z level=INFO msg="172.16.7.1 - - [05/May/2025:11:20:52 +0000] \"GET /storage/v1/b/sample-bucket/o/integration-tests%2Ftest-azure-compatible-storage%2Fhello_large.txt HTTP/1.1\" 200 617\n"
Now opendal generates this request (XML API)
time=2025-05-05T11:24:05.479Z level=INFO msg="172.16.7.1 - - [05/May/2025:11:24:05 +0000] \"POST /sample-bucket-ahhum/integration-tests/test-gcs-storage/hello_large.txt?uploads HTTP/1.1\" 404 59\n"
01182a4
to
800b902
Compare
@@ -63,8 +63,7 @@ impl MultiPartPolicy { | |||
} | |||
} | |||
|
|||
// Default values from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not true since target_part_num_bytes was set to 5GB
pub struct OpendalStorage { | ||
uri: Uri, | ||
op: Operator, | ||
multipart_policy: MultiPartPolicy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be configured now because OpenDAL does use multipart upload
quickwit/quickwit-storage/src/opendal_storage/google_cloud_storage.rs
Outdated
Show resolved
Hide resolved
// TODO: Uncomment storage_test_multi_part_upload when the XML API is | ||
// supported in the emulated GCS server | ||
// (https://github.com/fsouza/fake-gcs-server/pull/1164) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were you able to confirm this test would pass on a real GCS storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and I checked in the GCS audit logs that it was indeed a multipart upload (it didn't show the part size but there were 3 parts which matches 15MB / 5MB)
Description
After fixing coverage tests #5735 the GCS tests also showed some failures.
This PR also updates the GCS storage to use the same default as S3 for multipart because the last OpenDAL upgrade enabled multipart by default.
Closes #5398
How was this PR tested?
Integration tests should now pass.