-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Crash on _ssl__SSLContext_load_cert_chain_impl (requests running w/ cert in multi-threading) #134698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe some race condition. @ZeroIntensity want to have a look at this one? It may also be an issue in OpenSSL though as we're actually releasing the GIL here: PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state);
r = SSL_CTX_use_PrivateKey_file(self->ctx,
PyBytes_AS_STRING(keyfile ? keyfile_bytes : certfile_bytes),
SSL_FILETYPE_PEM);
PySSL_END_ALLOW_THREADS_S(pw_info.thread_state); when accessing the cert/key/etc. Note that entire function is in a critical section as well so I'm not sure we can do PySSL_BEGIN_ALLOW_THREADS_S. |
At a quick glance, releasing the GIL/detaching the tstate looks unsafe, because threads are free to run there without synchronization, which OpenSSL doesn't like. That includes free-threaded builds, because it will release the critical section and also break thread-safety. I'm not sure what the best way to fix this is, because we should definitely be detaching around long operations. Maybe a dedicated mutex? |
Here's a shorter repro: import ssl
import threading
ctx = ssl.create_default_context()
def race():
ctx.load_cert_chain("./Lib/test/certdata/keycert.pem")
threads = [threading.Thread(target=race) for _ in range(8)]
for thread in threads:
thread.start() |
@ZeroIntensity Will the patch by any chance backport to older versions of Python, or will we have to wait for Python 3.15? |
I think we can backport the fix to 3.13 and 3.14, unless this is causing security problems (e.g., a denial-of-service in production web servers), in which case we can get it all the way back to 3.9. I suspect it's the former, though. |
This seems like a known issue in requests as in psf/requests#6726 and psf/requests#6872 |
I think is covered under the note at https://docs.python.org/3/library/ssl.html#ssl.SSLContext as unsafe use
|
Yeah, I think requests has its own problem. I don't think that documentation note fully applies here, especially in my repro, where there's no connection being used at all. |
I think the term "connection" here really means concurrent writing i.e. it is safe to use context in parallel after configuration are done sequentially but after that modifications are not thread safe. |
What does "configuration" mean in this context? |
Calling things like |
Ok. FWIW, the relevant issue that added that note is #118596. It's not clear to me whether it's is in reference to general concurrent modifications, or whether "thread-safety" refers to data races in OpenSSL. I suspect it's more about weird behavior, not crashes. |
I think it would be good to get @tiran's opinion on this. Reading through openssl/openssl#2165 and especially David Benjamin's comments (e.g. openssl/openssl#2165 (comment)), my understanding is that we should:
E: also I suppose the reference count doesn't need to be atomic if we are locking around it's modification.. |
Tiran has been inactive for quite some time and @gpshead is the de facto maintainer |
That sounds a lot more complex than just locking. I'm getting a little bit confused here. Why don't we want to lock around detached calls? |
Sorry, I think there are two problems:
|
Perhaps using a |
I think this is at least related, but maybe a duplicate of #114653 ? |
There might be problems with writer starvation, but aren't most of the calls writes and not reads? Anyway, I'm a bit skeptical that using |
Okay sorry, let me back up and go over my understanding in detail: _ssl SafetySSL_CTX/
|
Uh oh!
There was an error while loading. Please reload this page.
Crash report
What happened?
Hi.
We've been investigating random crashes of our FastAPI application for over 6 months, and we think we've found the culprit.
When calling requests with a custom cert (like in the code below) in a multi-threaded paradigm, it can crashes. On some versions, like 3.12/3.13, it can in some case even block Python in a zombie state, where the process isn't killed but keep being hung.
I tested on all the versions mentionned, and the crash happened on all versions. In the latest ones (>=3.12), it feels like I get more often double free.
Here's the crash dump:
Since I did run the test on multiple versions:
openssl version
:OpenSSL 3.0.16 11 Feb 2025 (Library: OpenSSL 3.0.16 11 Feb 2025)
openssl version
:OpenSSL 3.4.1 11 Feb 2025 (Library: OpenSSL 3.4.1 11 Feb 2025)
python -VV
:Python 3.15.0a0 (heads/main-dirty:1729468016, May 23 2025, 14:52:55) [GCC 14.2.1 20250207]
Manjaro Linux 25.0.0
/proc/version
:Linux version 6.11.5-lqx1-1-lqx (linux-lqx@archlinux) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 ZEN SMP PREEMPT Tue, 22 Oct 2024 15:40:56 +0000
CPython versions tested on:
3.10, 3.11, 3.12, 3.13, 3.14, CPython main branch, 3.9
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
No response
Linked PRs
ssl
#134724The text was updated successfully, but these errors were encountered: