Skip to content

Fix gcs integration tests and use default multipart size #5765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 7, 2025
Merged

Conversation

rdettai
Copy link
Collaborator

@rdettai rdettai commented May 5, 2025

Description

After fixing coverage tests #5735 the GCS tests also showed some failures.

This PR also updates the GCS storage to use the same default as S3 for multipart because the last OpenDAL upgrade enabled multipart by default.

Closes #5398

How was this PR tested?

Integration tests should now pass.

@rdettai
Copy link
Collaborator Author

rdettai commented May 5, 2025

Now that the flakiness is fixed by using random bucket name, we see that the tests fail in the multipart tests https://github.com/quickwit-oss/quickwit/actions/runs/14833830101. This is a regression that was introduced during the last Opendal updgrade #5748

Comment on lines 89 to 95
// TODO: Uncomment storage_test_multi_part_upload when the XML API is
// supported in the emulated GCS server
// (https://github.com/fsouza/fake-gcs-server/pull/1164)

// quickwit_storage::storage_test_multi_part_upload(&mut object_storage)
// .await
// .context("test multipart upload failed")?;
Copy link
Collaborator Author

@rdettai rdettai May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latest version of opendal's GCS implementation now uses the XML API to support multipart uploads. Disabling the multipart test because the test GCS server does not support this API for now (fsouza/fake-gcs-server#1164)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For further ref, this is what the mock GCS server was receiving with the earlier opendal version:

time=2025-05-05T11:20:52.416Z level=INFO msg="172.16.7.1 - - [05/May/2025:11:20:52 +0000] \"POST /upload/storage/v1/b/sample-bucket/o?uploadType=media&name=integration-tests/test-azure-compatible-storage/hello_large.txt HTTP/1.1\" 200 617\n"
time=2025-05-05T11:20:52.417Z level=INFO msg="172.16.7.1 - - [05/May/2025:11:20:52 +0000] \"GET /storage/v1/b/sample-bucket/o/integration-tests%2Ftest-azure-compatible-storage%2Fhello_large.txt HTTP/1.1\" 200 617\n"

Now opendal generates this request (XML API)

time=2025-05-05T11:24:05.479Z level=INFO msg="172.16.7.1 - - [05/May/2025:11:24:05 +0000] \"POST /sample-bucket-ahhum/integration-tests/test-gcs-storage/hello_large.txt?uploads HTTP/1.1\" 404 59\n"

@rdettai rdettai requested a review from trinity-1686a May 5, 2025 12:32
@rdettai rdettai changed the title Fix gcs tests with dynamic bucket Fix gcs tests using a dynamic bucket name May 5, 2025
@rdettai rdettai changed the title Fix gcs tests using a dynamic bucket name Fix gcs integration tests May 5, 2025
@rdettai rdettai force-pushed the fix-gcs-tests branch 2 times, most recently from 01182a4 to 800b902 Compare May 5, 2025 13:34
@@ -63,8 +63,7 @@ impl MultiPartPolicy {
}
}

// Default values from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not true since target_part_num_bytes was set to 5GB

pub struct OpendalStorage {
uri: Uri,
op: Operator,
multipart_policy: MultiPartPolicy,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configured now because OpenDAL does use multipart upload

@rdettai
Copy link
Collaborator Author

rdettai commented May 5, 2025

@rdettai rdettai changed the title Fix gcs integration tests Fix gcs integration tests and use default multipart size May 5, 2025
Comment on lines +89 to +92
// TODO: Uncomment storage_test_multi_part_upload when the XML API is
// supported in the emulated GCS server
// (https://github.com/fsouza/fake-gcs-server/pull/1164)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were you able to confirm this test would pass on a real GCS storage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and I checked in the GCS audit logs that it was indeed a multipart upload (it didn't show the part size but there were 3 parts which matches 15MB / 5MB)

@rdettai rdettai requested a review from trinity-1686a May 7, 2025 07:57
@rdettai rdettai enabled auto-merge (squash) May 7, 2025 08:00
@rdettai rdettai merged commit e31edf6 into main May 7, 2025
8 checks passed
@rdettai rdettai deleted the fix-gcs-tests branch May 7, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_write_and_bulk_delete in the google_cloud_storage_test_suite fails consistently after first run
2 participants