-
Notifications
You must be signed in to change notification settings - Fork 116
aws: support multipart copy for objects larger than 5GB #561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4719ef4 to
09cef9b
Compare
09cef9b to
acc8cc4
Compare
|
I think this probably warrants a higher level ticket to discuss how we should support this, as a start it would be good to understand how other stores, i.e. GCS and Azure handle this, so that we can develop an abstraction that makes sense. In particular I wonder if adding this functionality would make more sense as part of the multipart upload functionality? This of course depends on what other stores support. In general filing an issue first to get consensus on an approach is a good idea before jumping in on an implementation |
|
Great, created #563. |
|
@tustvold updated to pass CI and a small tweak to avoid overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @james-rms and @tustvold -- the high level idea seems reasonable to me, but I think this code needs tests (maybe unit tests?) or something otherwise we may break the functionality inadvertently in some future refactor
a8d535c to
1fdf7d6
Compare
|
@tustvold I've refactored slightly for unit tests and added a couple of integration tests as well. Please take another look when you have some time. |
Which issue does this PR close?
Rationale for this change
Today, users that attempt to copy a >5GB object in S3 using
object_storewill see this error:The way to get around this problem per AWS's docs is to do the copy in several parts using multipart copies. This PR adds that functionality to the AWS client.
It adds two additional configuration parameters:
The defaults are chosen to minimise surprise: if people are used to copies not requiring several requests, we don't switch to that method until it's absolutely necessary, and when necessary, we use as few parts as possible.
What changes are included in this PR?
See above.
Are there any user-facing changes?
Yes - these configuration parameters should be covered by the docstring changes.