-
Notifications
You must be signed in to change notification settings - Fork 679
Allow AWS S3 Access Points #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure the underlying library implementing the support AWS S3 is able to handle this? have you tried to use the access point in the S3 URL instead of bucket name? |
Using the access point in the S3 URL submitted to Nextflow results in an error in the S3 service 'InvalidBucketName'. However the S3 CLI supports S3 Access Points as a direct replacement for bucket names; the following are equivalent:
In the log file it looks like the S3 AP ARN is not matching an S3 pattern so is not being treated as S3:
Log file: nextflow.log.df00f000-9180-4fae-8ead-30649b935077.1.zip |
I agree that this feature would be useful. We need some aws guru that puts the hands in the S3 client used by NF or we need to wait that official AWS implementation supporting NIO. A pull request in any case supporting this feature is welcome. |
If this feature were available in s3fs-nio, would it be possible for you to use that library? |
We may switch to it when it's stable. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@crabba any idea if this allows bypassing S3 rate limits? I mean https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling |
I don't think that the use of S3 Access Points would change any rate usage limits in the underlying bucket, as the access point is just a different way of implementing S3 access policies. |
All S3 Access points have an access point alias which is not an ARN and is compatible (usually) with Java URI Paths (unlike an ARN). This is probably what should be used here, frameworks like Spark and with other NIO libraries. https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points-alias.html The general pattern would be I have done some testing of this with https://github.com/awslabs/aws-java-nio-spi-for-s3 and it seems to work so will probably work with Nextflow's implementation of the s3-nio library. |
New feature: Allow AWS S3 Access Points
AWS S3 Access Points are unique hostnames attached to an S3 bucket, each with dedicated access policies. This allows large scale access control to be delegated to multiple APs, each dedicated to providing access to one user, rather than combining all access control in one large bucket policy. Larger scale users are increasingly using APs to simplify bucket access control.
Usage scenario
Allow S3 APs to be used as a parameter, for example input files:
"--reads", "s3://arn:aws:s3:<region>:<account-id>:accesspoint/<ap-name>/my-fastq-data/*_{1,2}.fastq.gz"
Suggest implementation
The AWS S3 CLI, SDKs, and REST API support Access Points. Currently, using an S3 AP ARN in place of a bucket name results in an error 'The specified bucket is not valid', so it seems the ARN is being used as a literal bucket name. Enabling this feature would involve recognising the AP ARN and using it appropriately in CLI or API calls.
The text was updated successfully, but these errors were encountered: