Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New mmap strategy breaks consuming binary http streams #23

Open
trifle opened this issue Nov 19, 2024 · 2 comments
Open

New mmap strategy breaks consuming binary http streams #23

trifle opened this issue Nov 19, 2024 · 2 comments

Comments

@trifle
Copy link

trifle commented Nov 19, 2024

Hi, I noticed you've added mmap to the code, which is great!

Unfortunately, it breaks consuming http streams, because urllib3 raw streams (urllib3.response.HTTPResponse) also (surprisingly) have a fileno!

So we get an exception:

    in get_memoryview(data)
     98 # Handle file object opened in 'rb' mode
     99 if hasattr(data, "fileno"):
--> 100     mm = mmap.mmap(data.fileno(), 0, access=mmap.ACCESS_READ)
    101     return memoryview(mm)
    103 # Handle BufferedReader

OSError: [Errno 22] Invalid argument

I've checked the set difference of attributes of a binary file handle and a HTTPResponse, and these attributes are (on my macos machine) exclusively available on files:

['_dealloc_warn',
 '_finalizing',
 'detach',
 'mode',
 'name',
 'peek',
 'raw',
 'readinto1',
 'write']

So it may be worth either switching the check or alternatively adding an extra condition that prevents mmap on urllib3.response.HTTPResponse objects.

Cheers!

@titusz
Copy link
Member

titusz commented Nov 19, 2024

Nice catch. You are welcome to send a pull request. Or I will look after it when I find some time :)

@trifle
Copy link
Author

trifle commented Nov 19, 2024

I'm not sure if I have the time, but if so I'll happily create a PR.

The easiest fix is of course to simply reject those streams.
(geturl seems like an attribute that is very, very unlikely to be present on file handles, but can be swapped of course depending on how strict the check should be):

    if hasattr(data, "fileno") and not hasattr(data, 'geturl'):

But to be honest, it seems like a loss to exclude any and all binary streams. A better option would be to properly support them. That means either creating a separate code path bypassing memoryviews entirely (as was the case about a year ago).

Or, if you think that even streams might benefit from memoryviews, we could transform each chunk read from the stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants