Skip to content

S3 download leak connection #5355

Open
Open
@sbxz

Description

@sbxz

Describe the bug

We are experiencing connection leak issues with the AWS SDK when downloading a file to S3.
If an error occurs just before subscribing to the donwload stream (or just if we never subscribe to the stream), the connection is never released.

Expected Behavior

I assume the connection should be released after a certain period of inactivity? However, none of the timeouts I have configured seem to have any effect.

Current Behavior

Once all the connections in the pool are occupied, we get connection acquisition errors. Even after an hour.

Caused by: java.lang.Throwable: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Consider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.
Increasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout.
If the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests.
	at software.amazon.awssdk.http.nio.netty.internal.utils.NettyUtils.decorateException(NettyUtils.java:69) ~[netty-nio-client-2.25.24.jar:na]
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:307) ~[netty-nio-client-2.25.24.jar:na]
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:188) ~[netty-nio-client-2.25.24.jar:na]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:503) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]
Caused by: java.util.concurrent.TimeoutException: Acquire operation took longer than 10000 milliseconds.
	at software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.timeoutAcquire(HealthCheckedChannelPool.java:77) ~[netty-nio-client-2.25.24.jar:na]
	at software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$acquire$0(HealthCheckedChannelPool.java:67) ~[netty-nio-client-2.25.24.jar:na]
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	... 7 common frames omitted

Reproduction Steps

For example, here I have a simple test with one connection in the pool, and if I try to call /download twice, the second one will never be able to acquire a connection.

    @GetMapping("/download")
    public Mono<Void> download() {
        return Mono.fromFuture(this.s3AsyncClient.getObject(r ->
                    r.bucket(BUCKET)
                        .key(S3_KEY),
                AsyncResponseTransformer.toPublisher()))
            .then();
    }
    
    @Bean
    public S3AsyncClient s3AsyncClients(final S3Configuration s3Configuration,
        final SdkAsyncHttpClient sdkAsyncHttpClient)
        throws URISyntaxException {

        return S3AsyncClient.builder()
            .httpClient(sdkAsyncHttpClient)
            .serviceConfiguration(s3Configuration)
            .forcePathStyle(Boolean.TRUE)
            .endpointOverride(new URI(ENDPOINT))
            .region(Region.of(REGION))
            .credentialsProvider(() -> AwsBasicCredentials.create(ACCESS_KEY, PRIVATE_KEY))
            .build();
    }

    @Bean
    public SdkAsyncHttpClient sdkAsyncHttpClient() {
        return NettyNioAsyncHttpClient.builder()
            .maxConcurrency(1)
            .connectionTimeToLive(Duration.ofSeconds(2))
            .connectionTimeout(Duration.ofSeconds(2))
            .connectionAcquisitionTimeout(Duration.ofSeconds(10))
            .connectionMaxIdleTime(Duration.ofSeconds(2))
            .writeTimeout(Duration.ofSeconds(2))
            .tlsNegotiationTimeout(Duration.ofSeconds(2))
            .readTimeout(Duration.ofSeconds(2))
            .tcpKeepAlive(false)
            .useIdleConnectionReaper(true)
            .build();
    }

    @Bean
    public S3Configuration s3Configuration() {
        return S3Configuration.builder()
            .checksumValidationEnabled(Boolean.FALSE)
            .chunkedEncodingEnabled(Boolean.TRUE)
            .build();
    }

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.26.12

JDK version used

21

Operating System and version

Windows 11

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.p2This is a standard priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions