Skip to content

Add aio/aiohttp.py module #658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

Add aio/aiohttp.py module #658

wants to merge 39 commits into from

Conversation

TingDaoK
Copy link
Contributor

@TingDaoK TingDaoK commented Jun 11, 2025

Issue #, if available:

  • add aio/aiohttp module, that provides async interface for HTTP functions.

Description of changes:

Connection Classes

  1. AIOHttpClientConnectionUnified

    • Base class for both HTTP/1.1 and HTTP/2 connections
    • Provides common connection management methods
    • Supports asynchronous connection establishment and closure
  2. AIOHttpClientConnection

    • HTTP/1.1 specific implementation
    • Created via AIOHttpClientConnection.new() async class method
  3. AIOHttp2ClientConnection

    • HTTP/2 specific implementation
    • Supports HTTP/2 settings configuration
    • Created via AIOHttp2ClientConnection.new() async class method
    • Supports callbacks for remote settings changes

Stream Classes

  1. AIOHttpClientStreamUnified

    • Base class for HTTP streams
    • Handles asynchronous request/response exchanges
    • Provides methods for retrieving response status, headers, and body chunks
  2. AIOHttpClientStream

    • HTTP/1.1 specific stream implementation
  3. AIOHttp2ClientStream

    • HTTP/2 specific stream implementation
    • Supports incremental body sending via async generators

TODO:

Support incremental body sending via async generators for HTTP/1.1, which will be transfer encoding binding.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@TingDaoK TingDaoK changed the title Asyncio Add http_asyncio Jun 12, 2025
Comment on lines 324 to 337
async def write_data_async(self,
data_stream: Union[InputStream, Any],
end_stream: bool = False) -> None:
"""Write data to the stream asynchronously.

Args:
data_stream (Union[InputStream, Any]): Data to write.
end_stream (bool): Whether this is the last data to write.

Returns:
None: When the write completes.
"""
future = self.write_data(data_stream, end_stream)
await asyncio.wrap_future(future)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an async method, but it doesn't look like I can actually pass an async data stream in. I would expect this to be able to take an AsyncIterator[bytes] as a data stream. I would also expect the http/1.1 stream to have this method.

Currently a major problem with the using bidirectional streaming in async is that we have to wrap async iterators and treat them like a sync data stream. We can make that work, but it works poorly because the only way to get the CRT to call read later is to throw a particular IO error that gets retried. That retry is immediate though, so you can end up in a situation where the CRT is running a hot loop on the read method as it waits for data to become available in the background.

To solve this, we really need the CRT to read async. Ideally by taking an AsyncIterator, but there might be other ways with Futures to make this an easier transition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Currently a major problem with the using bidirectional streaming in async is that we have to wrap async iterators and treat them like a sync data stream. We can make that work, but it works poorly because the only way to get the CRT to call read later is to throw a particular IO error that gets retried. That retry is immediate though, so you can end up in a situation where the CRT is running a hot loop on the read method as it waits for data to become available in the background.

I'll look into taking AsyncIterator[bytes].

But I think the major problem here has been resolved by this interface already. I updated the smithy-pyhton here to use this new interface.
Instead of expecting the InputStream to throw the particular IO error, just keep providing the data when it's available. And I tested with the sample, the CPU usage has been dropped from 30% to lower than 10%. Since this API will not just loop to read from the input stream, it wait until more input stream to be provided.

@TingDaoK TingDaoK marked this pull request as ready for review June 18, 2025 21:26
async def _set_request_body_generator(self, body_iterator: AsyncIterator[bytes]):
try:
async for chunk in body_iterator:
await self._write_data(io.BytesIO(chunk), False)
Copy link

@JordonPhillips JordonPhillips Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do this without wrapping this in a bytesio? I'm concerned about potential extra copies. It shouldn't happen so long as you do no mutations since it starts by just holding a reference. But I'm not sure what apis are being called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should already avoid the extra copy, it will be wrapped by our input stream interface use readinto.
https://github.com/awslabs/aws-crt-python/blob/main/awscrt/io.py#L689-L727

Comment on lines 365 to 366
async def _set_request_body_generator(self, body_iterator: AsyncIterator[bytes]):
...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is HTTP1.1 not able to share the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP/1.1 transfer encoding may need to add trailer headers and chunk extensions. So the API may need to be a bit different. Maybe let the iterator to yield a chunkinfo struct that can pass extra info like trailer instead of simple bytes.

Or we can leave trailer header as unsupported for now.

Anyhow I'd want to leave the http/1.1 implementation for a quick follow up PR if we want it.

@TingDaoK TingDaoK requested a review from JordonPhillips June 20, 2025 17:02
@TingDaoK TingDaoK changed the title Add http_asyncio Add aio/aiohttp.oy module Jun 23, 2025
@TingDaoK TingDaoK changed the title Add aio/aiohttp.oy module Add aio/aiohttp.py module Jun 23, 2025
bootstrap: Optional[ClientBootstrap] = None,
socket_options: Optional[SocketOptions] = None,
tls_connection_options: Optional[TlsConnectionOptions] = None,
proxy_options: Optional['HttpProxyOptions'] = None) -> "AIOHttpClientConnectionUnified":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does HttpProxyOptions need to be forward declared?

future.set_result(None)

_awscrt.http2_client_stream_write_data(self, body_stream, end_stream, on_write_complete)
await asyncio.wrap_future(future)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want to use concurrent Future here and then wrap it into asyncio one? why not use async io one from start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the concurrent future is thread safe, but the asyncio one is not.

If we just get asyncio future, we will need to invoke self._loop.call_soon_threadsafe() to set the future. Either way, it's a bit awkward.

But, yeah, I did use the asyncio future directly for some other places, I guess it would be nice to keep it consistent.

@dacevedo12

This comment was marked as off-topic.

@TingDaoK
Copy link
Contributor Author

just out of curiosity, why aiohttp instead of httpx?

I don't get what is the question here, I assume you confused this with the aiohttp project.
We are not depend on them, the awscrt.aio.aiohttp module is an async wrapper for the awscrt.http.
The name of the module is unfortunately confusing, but like our regular http module just named http.py, I don't think it will be a big concern to name our async io http module as aio/aiohttp.py.

@@ -7,6 +7,7 @@
'auth',
'crypto',
'http',
'aio.aiohttp',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; aiohttp is an existing library that's already widely used. Would it make more sense for this to just be aio.http since we already have that precedent for the existing http module?

Comment on lines +107 to +112
class AIOHttpClientConnection(AIOHttpClientConnectionUnified):
"""
An async HTTP/1.1 only client connection.

Use `AIOHttpClientConnection.new()` to establish a new connection.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended vision for this connection type? http/1.1 is inherently stateful and we can't multiplex that same way that h2 does. The request() method is also not async, so I'm curious what value this provides in its current state.

Is the idea that you take the stream and then write to it periodically until we get a signal the input is exhausted, then read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIOHttpClientConnection is designed to be HTTP/1.1 only. the protocol doesn't support multiplex, but we do some sort support it. The next request/response will just be waiting in the list to be sent after the previous request/response finishes.

The request() method is also not async

There is no async process in the request() method, it will start the request under the hood, and the stream interface will be handle the response async.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants