Skip to content

Crash on _ssl__SSLContext_load_cert_chain_impl (requests running w/ cert in multi-threading) #134698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Conobi opened this issue May 26, 2025 · 21 comments
Labels
extension-modules C modules in the Modules dir topic-SSL type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@Conobi
Copy link

Conobi commented May 26, 2025

Crash report

What happened?

Hi.
We've been investigating random crashes of our FastAPI application for over 6 months, and we think we've found the culprit.
When calling requests with a custom cert (like in the code below) in a multi-threaded paradigm, it can crashes. On some versions, like 3.12/3.13, it can in some case even block Python in a zombie state, where the process isn't killed but keep being hung.

I tested on all the versions mentionned, and the crash happened on all versions. In the latest ones (>=3.12), it feels like I get more often double free.

import threading
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime, timedelta
from pathlib import Path

import requests
from cryptography import x509
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives.serialization import pkcs12
from cryptography.x509.oid import NameOID

CERT_PEM = "client_cert.pem"
KEY_PEM = "client_key.pem"
PFX_FILE = "client_cert.pfx"
CERT_PASSWORD = b"password"  # For PFX export


def generate_and_save_cert() -> None:
    """Generate RSA key and self-signed cert, save PEM and PFX. Can be commented out. """
    key = rsa.generate_private_key(
        public_exponent=65537,
        key_size=2048,
    )
    subject = issuer = x509.Name(
        [
            x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
            x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, "California"),
            x509.NameAttribute(NameOID.LOCALITY_NAME, "San Francisco"),
            x509.NameAttribute(NameOID.ORGANIZATION_NAME, "Test Org"),
            x509.NameAttribute(NameOID.COMMON_NAME, "localhost"),
        ]
    )
    cert = (
        x509.CertificateBuilder()
        .subject_name(subject)
        .issuer_name(issuer)
        .public_key(key.public_key())
        .serial_number(x509.random_serial_number())
        .not_valid_before(
            datetime.now(
                datetime.utcnow()  # for backward compatibility
            )
        )
        .not_valid_after(
            datetime.now(
                datetime.utcnow()  # for backward compatibility
            )
            + timedelta(days=365)
        )
        .sign(key, hashes.SHA256())
    )

    Path(CERT_PEM).write_bytes(cert.public_bytes(serialization.Encoding.PEM))
    Path(KEY_PEM).write_bytes(
        key.private_bytes(
            encoding=serialization.Encoding.PEM,
            format=serialization.PrivateFormat.TraditionalOpenSSL,
            encryption_algorithm=serialization.NoEncryption(),
        )
    )

    pfx = pkcs12.serialize_key_and_certificates(
        name=b"client",
        key=key,
        cert=cert,
        cas=None,
        encryption_algorithm=serialization.BestAvailableEncryption(CERT_PASSWORD),
    )
    Path(PFX_FILE).write_bytes(pfx)


def post_with_cert(url: str, idx: int) -> None:
    """Perform a POST request using PEM cert."""
    try:
        response = requests.get(
            url,
            data={"test": f"thread-{idx}"},
            cert=(CERT_PEM, KEY_PEM),  # PEM files
            timeout=10,
        )
        print(f"Thread {idx}: Status {response.status_code}")
    except Exception as exc:
        print(f"Thread {idx}: Exception {exc}")


def main() -> None:
    generate_and_save_cert()
    url = "https://example.com"  # <-- Change this!
    with ThreadPoolExecutor(max_workers=150) as executor:
        for i in range(150):
            executor.submit(post_with_cert, url, i)


if __name__ == "__main__":
    main()

Here's the crash dump:

Core was generated by `/usr/local/bin/python3.15 foo.py'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000771454ecf624 in ?? () from /usr/lib/libc.so.6
[Current thread is 1 (Thread 0x7713720006c0 (LWP 87696))]

#0  0x0000771454ecf624 in ?? () from /usr/lib/libc.so.6
#1  0x0000771454e75ba0 in raise () from /usr/lib/libc.so.6
#2  0x0000771454e5d582 in abort () from /usr/lib/libc.so.6
#3  0x0000771454e5e3bf in ?? () from /usr/lib/libc.so.6
#4  0x0000771454ed9765 in ?? () from /usr/lib/libc.so.6
#5  0x0000771454edbc8a in ?? () from /usr/lib/libc.so.6
#6  0x0000771454ede9ab in free () from /usr/lib/libc.so.6
#7  0x0000771453dc2eb5 in RSA_free () from /usr/lib/libcrypto.so.3
#8  0x0000771453d5ebd2 in ?? () from /usr/lib/libcrypto.so.3
#9  0x0000771453d5f498 in EVP_PKEY_free () from /usr/lib/libcrypto.so.3
#10 0x0000771454198e8a in ?? () from /usr/lib/libssl.so.3
#11 0x000077145419e4d6 in SSL_CTX_use_PrivateKey_file () from /usr/lib/libssl.so.3
#12 0x000077145429f899 in _ssl__SSLContext_load_cert_chain_impl (self=0x7714536f9b50, certfile=<optimized out>, keyfile=<optimized out>, password=<optimized out>) at ./Modules/_ssl.c:4148
#13 _ssl__SSLContext_load_cert_chain (self=0x7714536f9b50, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at ./Modules/clinic/_ssl.c.h:1429
#14 0x0000586280553ca0 in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x771454583ba0, args=0x7714540e89e0, nargsf=<optimized out>, kwnames=0x0)
    at ./Include/internal/pycore_call.h:169
#15 PyObject_Vectorcall (callable=0x771454583ba0, args=args@entry=0x771371ffe468, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at Objects/call.c:327
#16 0x00005862806cda53 in _PyEval_EvalFrameDefault (tstate=0x58629578de30, frame=<optimized out>, throwflag=<optimized out>) at Python/generated_cases.c.h:1619
#17 0x00005862806d9efc in _PyEval_EvalFrame (tstate=0x58629578de30, frame=<optimized out>, throwflag=0) at ./Include/internal/pycore_ceval.h:119
#18 _PyEval_Vector (tstate=0x58629578de30, func=0x7714538ee090, locals=0x0, args=0x771452165530, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:1961
#19 0x0000586280557fc2 in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x7714538ee090, args=0x771452165530, nargsf=4, kwnames=0x7714521800b0)
    at ./Include/internal/pycore_call.h:169
#20 method_vectorcall (method=<optimized out>, args=0x771452165538, nargsf=<optimized out>, kwnames=0x7714521800b0) at Objects/classobject.c:64
#21 0x0000586280555b98 in _PyVectorcall_Call (tstate=0x58629578de30, func=0x586280557e30 <method_vectorcall>, callable=0x77145217da00, tuple=<optimized out>, kwargs=<optimized out>)
    at Objects/call.c:285
#22 _PyObject_Call (tstate=0x58629578de30, callable=0x77145217da00, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:348
#23 PyObject_Call (callable=0x77145217da00, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:373
#24 0x00005862806cde12 in _PyEval_EvalFrameDefault (tstate=0x58629578de30, frame=<optimized out>, throwflag=<optimized out>) at Python/generated_cases.c.h:2654
#25 0x00005862806d9efc in _PyEval_EvalFrame (tstate=0x58629578de30, frame=<optimized out>, throwflag=0) at ./Include/internal/pycore_ceval.h:119
#26 _PyEval_Vector (tstate=0x58629578de30, func=0x77145361b1c0, locals=0x0, args=0x771452157730, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:1961
#27 0x0000586280557fc2 in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x77145361b1c0, args=0x771452157730, nargsf=2, kwnames=0x771452164160)
    at ./Include/internal/pycore_call.h:169
#28 method_vectorcall (method=<optimized out>, args=0x771452157738, nargsf=<optimized out>, kwnames=0x771452164160) at Objects/classobject.c:64
#29 0x0000586280555b98 in _PyVectorcall_Call (tstate=0x58629578de30, func=0x586280557e30 <method_vectorcall>, callable=0x771452157680, tuple=<optimized out>, kwargs=<optimized out>)
    at Objects/call.c:285
#30 _PyObject_Call (tstate=0x58629578de30, callable=0x771452157680, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:348
#31 PyObject_Call (callable=0x771452157680, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:373
#32 0x00005862806cde12 in _PyEval_EvalFrameDefault (tstate=0x58629578de30, frame=<optimized out>, throwflag=<optimized out>) at Python/generated_cases.c.h:2654
#33 0x00005862806d9efc in _PyEval_EvalFrame (tstate=0x58629578de30, frame=<optimized out>, throwflag=0) at ./Include/internal/pycore_ceval.h:119
#34 _PyEval_Vector (tstate=0x58629578de30, func=0x77145361be20, locals=0x0, args=0x771452157b70, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:1961
#35 0x0000586280557fc2 in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x77145361be20, args=0x771452157b70, nargsf=2, kwnames=0x771452164700)
    at ./Include/internal/pycore_call.h:169
#36 method_vectorcall (method=<optimized out>, args=0x771452157b78, nargsf=<optimized out>, kwnames=0x771452164700) at Objects/classobject.c:64
#37 0x0000586280555b98 in _PyVectorcall_Call (tstate=0x58629578de30, func=0x586280557e30 <method_vectorcall>, callable=0x771452156b80, tuple=<optimized out>, kwargs=<optimized out>)
    at Objects/call.c:285
#38 _PyObject_Call (tstate=0x58629578de30, callable=0x771452156b80, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:348
#39 PyObject_Call (callable=0x771452156b80, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:373
#40 0x00005862806cde12 in _PyEval_EvalFrameDefault (tstate=0x58629578de30, frame=<optimized out>, throwflag=<optimized out>) at Python/generated_cases.c.h:2654
#41 0x00005862806d9efc in _PyEval_EvalFrame (tstate=0x58629578de30, frame=<optimized out>, throwflag=0) at ./Include/internal/pycore_ceval.h:119
#42 _PyEval_Vector (tstate=0x58629578de30, func=0x77145361b8a0, locals=0x0, args=0x7714521553f0, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:1961
#43 0x0000586280557fc2 in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x77145361b8a0, args=0x7714521553f0, nargsf=1, kwnames=0x771452127d60)
    at ./Include/internal/pycore_call.h:169
#44 method_vectorcall (method=<optimized out>, args=0x7714521553f8, nargsf=<optimized out>, kwnames=0x771452127d60) at Objects/classobject.c:64
#45 0x0000586280555b98 in _PyVectorcall_Call (tstate=0x58629578de30, func=0x586280557e30 <method_vectorcall>, callable=0x771452154c40, tuple=<optimized out>, kwargs=<optimized out>)
    at Objects/call.c:285
#46 _PyObject_Call (tstate=0x58629578de30, callable=0x771452154c40, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:348
#47 PyObject_Call (callable=0x771452154c40, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:373
#48 0x00005862806cde12 in _PyEval_EvalFrameDefault (tstate=0x58629578de30, frame=<optimized out>, throwflag=<optimized out>) at Python/generated_cases.c.h:2654
#49 0x00005862806d9efc in _PyEval_EvalFrame (tstate=0x58629578de30, frame=<optimized out>, throwflag=0) at ./Include/internal/pycore_ceval.h:119
#50 _PyEval_Vector (tstate=0x58629578de30, func=0x7714549d1170, locals=0x0, args=0x771371fff938, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:1961
#51 0x000058628055802b in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x7714549d1170, args=0x771371fff938, nargsf=1, kwnames=0x0) at ./Include/internal/pycore_call.h:169
#52 method_vectorcall (method=<optimized out>, args=0x771371fffbd8, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:72
#53 0x00005862806f784c in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x771452154800, args=0x771371fffbd8, nargsf=<optimized out>, kwnames=0x0)
    at ./Include/internal/pycore_call.h:169
#54 context_run (self=0x7714521549c0, args=0x771371fffbd0, nargs=<optimized out>, kwnames=0x0) at Python/context.c:728
#55 0x00005862806ceefe in _PyEval_EvalFrameDefault (tstate=0x58629578de30, frame=<optimized out>, throwflag=<optimized out>) at Python/generated_cases.c.h:3764
#56 0x00005862806d9efc in _PyEval_EvalFrame (tstate=0x58629578de30, frame=<optimized out>, throwflag=0) at ./Include/internal/pycore_ceval.h:119
#57 _PyEval_Vector (tstate=0x58629578de30, func=0x7714549d1220, locals=0x0, args=0x771371fffdd8, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:1961
#58 0x000058628055802b in _PyObject_VectorcallTstate (tstate=0x58629578de30, callable=0x7714549d1220, args=0x771371fffdd8, nargsf=1, kwnames=0x0) at ./Include/internal/pycore_call.h:169
#59 method_vectorcall (method=<optimized out>, args=0x586280a26958 <_PyRuntime+90104>, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:72
#60 0x00005862807ed50c in thread_run (boot_raw=0x58629578ddf0) at ./Modules/_threadmodule.c:368
#61 0x000058628076d957 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:242
#62 0x0000771454ecd70a in ?? () from /usr/lib/libc.so.6
#63 0x0000771454f51aac in ?? () from /usr/lib/libc.so.6

Since I did run the test on multiple versions:

  • I did run the test on official Docker Python images (except 3.9/3.10/3.14/3.15)
  • My Docker openssl version: OpenSSL 3.0.16 11 Feb 2025 (Library: OpenSSL 3.0.16 11 Feb 2025)
  • My host openssl version: OpenSSL 3.4.1 11 Feb 2025 (Library: OpenSSL 3.4.1 11 Feb 2025)
  • For CPython main branch, here's my python -VV: Python 3.15.0a0 (heads/main-dirty:1729468016, May 23 2025, 14:52:55) [GCC 14.2.1 20250207]
  • My distrib: Manjaro Linux 25.0.0
  • /proc/version: Linux version 6.11.5-lqx1-1-lqx (linux-lqx@archlinux) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 ZEN SMP PREEMPT Tue, 22 Oct 2024 15:40:56 +0000

CPython versions tested on:

3.10, 3.11, 3.12, 3.13, 3.14, CPython main branch, 3.9

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

No response

Linked PRs

@Conobi Conobi added the type-crash A hard crash of the interpreter, possibly with a core dump label May 26, 2025
@Conobi Conobi changed the title Crash on _ssl__SSLContext_load_cert_chain_impl Crash on _ssl__SSLContext_load_cert_chain_impl (requests running w/ cert in multi-threading) May 26, 2025
@picnixz picnixz added extension-modules C modules in the Modules dir topic-SSL labels May 26, 2025
@picnixz
Copy link
Member

picnixz commented May 26, 2025

Maybe some race condition. @ZeroIntensity want to have a look at this one?

It may also be an issue in OpenSSL though as we're actually releasing the GIL here:

    PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state);
    r = SSL_CTX_use_PrivateKey_file(self->ctx,
        PyBytes_AS_STRING(keyfile ? keyfile_bytes : certfile_bytes),
        SSL_FILETYPE_PEM);
    PySSL_END_ALLOW_THREADS_S(pw_info.thread_state);

when accessing the cert/key/etc. Note that entire function is in a critical section as well so I'm not sure we can do PySSL_BEGIN_ALLOW_THREADS_S.

@ZeroIntensity
Copy link
Member

At a quick glance, releasing the GIL/detaching the tstate looks unsafe, because threads are free to run there without synchronization, which OpenSSL doesn't like. That includes free-threaded builds, because it will release the critical section and also break thread-safety. I'm not sure what the best way to fix this is, because we should definitely be detaching around long operations. Maybe a dedicated mutex?

@ZeroIntensity
Copy link
Member

Here's a shorter repro:

import ssl
import threading

ctx = ssl.create_default_context()

def race():
    ctx.load_cert_chain("./Lib/test/certdata/keycert.pem")

threads = [threading.Thread(target=race) for _ in range(8)]
for thread in threads:
    thread.start()

@Conobi
Copy link
Author

Conobi commented May 26, 2025

@ZeroIntensity Will the patch by any chance backport to older versions of Python, or will we have to wait for Python 3.15?

@ZeroIntensity
Copy link
Member

ZeroIntensity commented May 26, 2025

I think we can backport the fix to 3.13 and 3.14, unless this is causing security problems (e.g., a denial-of-service in production web servers), in which case we can get it all the way back to 3.9. I suspect it's the former, though.

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented May 26, 2025

This seems like a known issue in requests as in psf/requests#6726 and psf/requests#6872

@kumaraditya303
Copy link
Contributor

I think is covered under the note at https://docs.python.org/3/library/ssl.html#ssl.SSLContext as unsafe use

Note SSLContext is designed to be shared and used by multiple connections. Thus, it is thread-safe as long as it is not reconfigured after being used by a connection.

@ZeroIntensity
Copy link
Member

Yeah, I think requests has its own problem. I don't think that documentation note fully applies here, especially in my repro, where there's no connection being used at all.

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented May 26, 2025

Yeah, I think requests has its own problem. I don't think that documentation note fully applies here, especially in #134698 (comment), where there's no connection being used at all.

I think the term "connection" here really means concurrent writing i.e. it is safe to use context in parallel after configuration are done sequentially but after that modifications are not thread safe.

@ZeroIntensity
Copy link
Member

What does "configuration" mean in this context?

@kumaraditya303
Copy link
Contributor

What does "configuration" mean in this context?

Calling things like load_cert_chain and load_verify_locations and similar.

@ZeroIntensity
Copy link
Member

Ok. FWIW, the relevant issue that added that note is #118596. It's not clear to me whether it's is in reference to general concurrent modifications, or whether "thread-safety" refers to data races in OpenSSL. I suspect it's more about weird behavior, not crashes.

@emmatyping
Copy link
Member

emmatyping commented May 26, 2025

I think it would be good to get @tiran's opinion on this.

Reading through openssl/openssl#2165 and especially David Benjamin's comments (e.g. openssl/openssl#2165 (comment)), my understanding is that we should:

  • lock around SSL_new calls to ensure those are not done concurrently
  • keep an atomic reference count in PySSLContext tracking references to PySSLSockets (modified by SSL_new/SSL_free). If there are referrers, raise errors on things that would modify the PySSLContext.

E: also I suppose the reference count doesn't need to be atomic if we are locking around it's modification..

@picnixz
Copy link
Member

picnixz commented May 26, 2025

Tiran has been inactive for quite some time and @gpshead is the de facto maintainer

@ZeroIntensity
Copy link
Member

keep an atomic reference count in PySSLContext tracking references to PySSLSockets (modified by SSL_new/SSL_free). If there are referrers, raise errors on things that would modify the PySSLContext.

That sounds a lot more complex than just locking.

I'm getting a little bit confused here. Why don't we want to lock around detached calls?

@emmatyping
Copy link
Member

Sorry, I think there are two problems:

  1. concurrent writing to contexts is unsafe (I agree locking is probably the right solution to this)
  2. additionally, writes to the context after a new SSL is made is unsafe, except where allowed. Perhaps this should be an independent issue.

@emmatyping
Copy link
Member

Perhaps using a _PyRWMutex would resolve concerns about the performance impact of adding locks? I think we need to enforce exclusivity when writing, but allowing concurrent reads should be safe as I understand it.

@emmatyping
Copy link
Member

I think this is at least related, but maybe a duplicate of #114653 ?

@ZeroIntensity
Copy link
Member

Perhaps using a _PyRWMutex would resolve concerns about the performance impact of adding locks? I think we need to enforce exclusivity when writing, but allowing concurrent reads should be safe as I understand it.

There might be problems with writer starvation, but aren't most of the calls writes and not reads?

Anyway, I'm a bit skeptical that using PyMutex will have any significant performance impact for single-threaded code. They're quite fast when they're uncontended. There shouldn't be any multithreaded cases where we have to worry about performance regressions, because coincidentally, those cases crash right now!

@emmatyping
Copy link
Member

Okay sorry, let me back up and go over my understanding in detail:

_ssl Safety

SSL_CTX/SSLContext

(quotes below by David Benjamin)

  1. "A thread may not reconfigure an SSL_CTX while another thread is accessing it."
  2. "if there is an SSL attached to the SSL_CTX, reconfiguring the SSL_CTX should probably be undefined (except when documented otherwise)"
    A lock prevents 1) but not 2). After thinking it over more, I rescind my suggestion to use a _PyRWMutex, I think it is probably fine to use a PyMutex to guard SSL_CTX. An SSL_CTX can be used to read statistics about connections and a few other things, but I expect that to be less common. Today, concurrent writes violate 1) and likely lead to a crash, so adding PyMutexs around SSL_CTX is probably fine. We should probably also fix 2) but that can be a separate issue.

SSL/SSLSocket

(quote below by Matt Caswell, an OpenSSL maintainer)

  1. Common pattern is to create them from an SSL_CTX in a thread
  2. "Make sure that the SSL object is only ever accessed by one thread."

I believe the intent (based on this follow-up comment) is that SSL objects should also be locked, and only accessed by one thread at a time.

So to me it sounds like we should:

  1. add per-object PyMutexs to guard SSL and SSL_CTX (which gh-134698: Hold a lock when the thread state is detached in ssl #134724 does)
  2. Ensure changing an SSLContext after it has attached to an SSLSocket raises an error

_ssl Performance

The performance concerns enter when we look at how these objects get used in a multi-threaded setting. Generally, you make an SSLContext then set up a threadpool to make new SSLSockets whenever a new connection is initiated. We would need to lock around the creation of new SSLSockets which would increase the connection latency. For high load servers there could be a fair amount of contention on the SSLContext. I believe that this pattern currently works? (need to verify).

In a single threaded context however, the lock should have little to no overhead since there will be no contention.

The bug at hand

All that being said (and I think these issues are important to address!), I believe the crash in this thread is actually #114653, where OpenSSL's atexit handlers get called before threads finalize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-SSL type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

5 participants