Skip to content

Commit 8cc5aa4

Browse files
jbmsgpshead
andauthored
gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization (GH-105805)
Instead of surprise crashes and memory corruption, we now hang threads that attempt to re-enter the Python interpreter after Python runtime finalization has started. These are typically daemon threads (our long standing mis-feature) but could also be threads spawned by extension modules that then try to call into Python. This marks the `PyThread_exit_thread` public C API as deprecated as there is no plausible safe way to accomplish that on any supported platform in the face of things like C++ code with finalizers anywhere on a thread's stack. Doing this was the least bad option. Co-authored-by: Gregory P. Smith <[email protected]>
1 parent 113b2d7 commit 8cc5aa4

File tree

10 files changed

+247
-29
lines changed

10 files changed

+247
-29
lines changed

Diff for: Doc/c-api/init.rst

+60-16
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,11 @@ Initializing and finalizing the interpreter
430430
Some memory allocated by extension modules may not be freed. Some extensions may not
431431
work properly if their initialization routine is called more than once; this can
432432
happen if an application calls :c:func:`Py_Initialize` and :c:func:`Py_FinalizeEx`
433-
more than once.
433+
more than once. :c:func:`Py_FinalizeEx` must not be called recursively from
434+
within itself. Therefore, it must not be called by any code that may be run
435+
as part of the interpreter shutdown process, such as :py:mod:`atexit`
436+
handlers, object finalizers, or any code that may be run while flushing the
437+
stdout and stderr files.
434438

435439
.. audit-event:: cpython._PySys_ClearAuditHooks "" c.Py_FinalizeEx
436440

@@ -960,6 +964,37 @@ thread, where the CPython global runtime was originally initialized.
960964
The only exception is if :c:func:`exec` will be called immediately
961965
after.
962966
967+
.. _cautions-regarding-runtime-finalization:
968+
969+
Cautions regarding runtime finalization
970+
---------------------------------------
971+
972+
In the late stage of :term:`interpreter shutdown`, after attempting to wait for
973+
non-daemon threads to exit (though this can be interrupted by
974+
:class:`KeyboardInterrupt`) and running the :mod:`atexit` functions, the runtime
975+
is marked as *finalizing*: :c:func:`_Py_IsFinalizing` and
976+
:func:`sys.is_finalizing` return true. At this point, only the *finalization
977+
thread* that initiated finalization (typically the main thread) is allowed to
978+
acquire the :term:`GIL`.
979+
980+
If any thread, other than the finalization thread, attempts to acquire the GIL
981+
during finalization, either explicitly via :c:func:`PyGILState_Ensure`,
982+
:c:macro:`Py_END_ALLOW_THREADS`, :c:func:`PyEval_AcquireThread`, or
983+
:c:func:`PyEval_AcquireLock`, or implicitly when the interpreter attempts to
984+
reacquire it after having yielded it, the thread enters **a permanently blocked
985+
state** where it remains until the program exits. In most cases this is
986+
harmless, but this can result in deadlock if a later stage of finalization
987+
attempts to acquire a lock owned by the blocked thread, or otherwise waits on
988+
the blocked thread.
989+
990+
Gross? Yes. This prevents random crashes and/or unexpectedly skipped C++
991+
finalizations further up the call stack when such threads were forcibly exited
992+
here in CPython 3.13 and earlier. The CPython runtime GIL acquiring C APIs
993+
have never had any error reporting or handling expectations at GIL acquisition
994+
time that would've allowed for graceful exit from this situation. Changing that
995+
would require new stable C APIs and rewriting the majority of C code in the
996+
CPython ecosystem to use those with error handling.
997+
963998
964999
High-level API
9651000
--------------
@@ -1033,11 +1068,14 @@ code, or when embedding the Python interpreter:
10331068
ensues.
10341069
10351070
.. note::
1036-
Calling this function from a thread when the runtime is finalizing
1037-
will terminate the thread, even if the thread was not created by Python.
1038-
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
1039-
check if the interpreter is in process of being finalized before calling
1040-
this function to avoid unwanted termination.
1071+
Calling this function from a thread when the runtime is finalizing will
1072+
hang the thread until the program exits, even if the thread was not
1073+
created by Python. Refer to
1074+
:ref:`cautions-regarding-runtime-finalization` for more details.
1075+
1076+
.. versionchanged:: next
1077+
Hangs the current thread, rather than terminating it, if called while the
1078+
interpreter is finalizing.
10411079
10421080
.. c:function:: PyThreadState* PyThreadState_Get()
10431081
@@ -1092,11 +1130,14 @@ with sub-interpreters:
10921130
to call arbitrary Python code. Failure is a fatal error.
10931131
10941132
.. note::
1095-
Calling this function from a thread when the runtime is finalizing
1096-
will terminate the thread, even if the thread was not created by Python.
1097-
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
1098-
check if the interpreter is in process of being finalized before calling
1099-
this function to avoid unwanted termination.
1133+
Calling this function from a thread when the runtime is finalizing will
1134+
hang the thread until the program exits, even if the thread was not
1135+
created by Python. Refer to
1136+
:ref:`cautions-regarding-runtime-finalization` for more details.
1137+
1138+
.. versionchanged:: next
1139+
Hangs the current thread, rather than terminating it, if called while the
1140+
interpreter is finalizing.
11001141
11011142
.. c:function:: void PyGILState_Release(PyGILState_STATE)
11021143
@@ -1374,17 +1415,20 @@ All of the following functions must be called after :c:func:`Py_Initialize`.
13741415
If this thread already has the lock, deadlock ensues.
13751416
13761417
.. note::
1377-
Calling this function from a thread when the runtime is finalizing
1378-
will terminate the thread, even if the thread was not created by Python.
1379-
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
1380-
check if the interpreter is in process of being finalized before calling
1381-
this function to avoid unwanted termination.
1418+
Calling this function from a thread when the runtime is finalizing will
1419+
hang the thread until the program exits, even if the thread was not
1420+
created by Python. Refer to
1421+
:ref:`cautions-regarding-runtime-finalization` for more details.
13821422
13831423
.. versionchanged:: 3.8
13841424
Updated to be consistent with :c:func:`PyEval_RestoreThread`,
13851425
:c:func:`Py_END_ALLOW_THREADS`, and :c:func:`PyGILState_Ensure`,
13861426
and terminate the current thread if called while the interpreter is finalizing.
13871427
1428+
.. versionchanged:: next
1429+
Hangs the current thread, rather than terminating it, if called while the
1430+
interpreter is finalizing.
1431+
13881432
:c:func:`PyEval_RestoreThread` is a higher-level function which is always
13891433
available (even when threads have not been initialized).
13901434

Diff for: Include/internal/pycore_pythread.h

+13
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,19 @@ PyAPI_FUNC(int) PyThread_join_thread(PyThread_handle_t);
152152
* a non-zero value on failure.
153153
*/
154154
PyAPI_FUNC(int) PyThread_detach_thread(PyThread_handle_t);
155+
/*
156+
* Hangs the thread indefinitely without exiting it.
157+
*
158+
* gh-87135: There is no safe way to exit a thread other than returning
159+
* normally from its start function. This is used during finalization in lieu
160+
* of actually exiting the thread. Since the program is expected to terminate
161+
* soon anyway, it does not matter if the thread stack stays around until then.
162+
*
163+
* This is unfortunate for embedders who may not be terminating their process
164+
* when they're done with the interpreter, but our C API design does not allow
165+
* for safely exiting threads attempting to re-enter Python post finalization.
166+
*/
167+
void _Py_NO_RETURN PyThread_hang_thread(void);
155168

156169
#ifdef __cplusplus
157170
}

Diff for: Include/pythread.h

+20-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,26 @@ typedef enum PyLockStatus {
1717

1818
PyAPI_FUNC(void) PyThread_init_thread(void);
1919
PyAPI_FUNC(unsigned long) PyThread_start_new_thread(void (*)(void *), void *);
20-
PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
20+
/* Terminates the current thread. Considered unsafe.
21+
*
22+
* WARNING: This function is only safe to call if all functions in the full call
23+
* stack are written to safely allow it. Additionally, the behavior is
24+
* platform-dependent. This function should be avoided, and is no longer called
25+
* by Python itself. It is retained only for compatibility with existing C
26+
* extension code.
27+
*
28+
* With pthreads, calls `pthread_exit` causes some libcs (glibc?) to attempt to
29+
* unwind the stack and call C++ destructors; if a `noexcept` function is
30+
* reached, they may terminate the process. Others (macOS) do unwinding.
31+
*
32+
* On Windows, calls `_endthreadex` which kills the thread without calling C++
33+
* destructors.
34+
*
35+
* In either case there is a risk of invalid references remaining to data on the
36+
* thread stack.
37+
*/
38+
Py_DEPRECATED(3.14) PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
39+
2140
PyAPI_FUNC(unsigned long) PyThread_get_thread_ident(void);
2241

2342
#if (defined(__APPLE__) || defined(__linux__) || defined(_WIN32) \

Diff for: Lib/test/test_threading.py

+70
Original file line numberDiff line numberDiff line change
@@ -1171,6 +1171,76 @@ def __del__(self):
11711171
self.assertEqual(out.strip(), b"OK")
11721172
self.assertIn(b"can't create new thread at interpreter shutdown", err)
11731173

1174+
@cpython_only
1175+
def test_finalize_daemon_thread_hang(self):
1176+
if support.check_sanitizer(thread=True, memory=True):
1177+
# the thread running `time.sleep(100)` below will still be alive
1178+
# at process exit
1179+
self.skipTest(
1180+
"https://github.com/python/cpython/issues/124878 - Known"
1181+
" race condition that TSAN identifies.")
1182+
# gh-87135: tests that daemon threads hang during finalization
1183+
script = textwrap.dedent('''
1184+
import os
1185+
import sys
1186+
import threading
1187+
import time
1188+
import _testcapi
1189+
1190+
lock = threading.Lock()
1191+
lock.acquire()
1192+
thread_started_event = threading.Event()
1193+
def thread_func():
1194+
try:
1195+
thread_started_event.set()
1196+
_testcapi.finalize_thread_hang(lock.acquire)
1197+
finally:
1198+
# Control must not reach here.
1199+
os._exit(2)
1200+
1201+
t = threading.Thread(target=thread_func)
1202+
t.daemon = True
1203+
t.start()
1204+
thread_started_event.wait()
1205+
# Sleep to ensure daemon thread is blocked on `lock.acquire`
1206+
#
1207+
# Note: This test is designed so that in the unlikely case that
1208+
# `0.1` seconds is not sufficient time for the thread to become
1209+
# blocked on `lock.acquire`, the test will still pass, it just
1210+
# won't be properly testing the thread behavior during
1211+
# finalization.
1212+
time.sleep(0.1)
1213+
1214+
def run_during_finalization():
1215+
# Wake up daemon thread
1216+
lock.release()
1217+
# Sleep to give the daemon thread time to crash if it is going
1218+
# to.
1219+
#
1220+
# Note: If due to an exceptionally slow execution this delay is
1221+
# insufficient, the test will still pass but will simply be
1222+
# ineffective as a test.
1223+
time.sleep(0.1)
1224+
# If control reaches here, the test succeeded.
1225+
os._exit(0)
1226+
1227+
# Replace sys.stderr.flush as a way to run code during finalization
1228+
orig_flush = sys.stderr.flush
1229+
def do_flush(*args, **kwargs):
1230+
orig_flush(*args, **kwargs)
1231+
if not sys.is_finalizing:
1232+
return
1233+
sys.stderr.flush = orig_flush
1234+
run_during_finalization()
1235+
1236+
sys.stderr.flush = do_flush
1237+
1238+
# If the follow exit code is retained, `run_during_finalization`
1239+
# did not run.
1240+
sys.exit(1)
1241+
''')
1242+
assert_python_ok("-c", script)
1243+
11741244
class ThreadJoinOnShutdown(BaseTestCase):
11751245

11761246
def _run_and_join(self, script):
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Attempting to acquire the GIL after runtime finalization has begun in a
2+
different thread now causes the thread to hang rather than terminate, which
3+
avoids potential crashes or memory corruption caused by attempting to
4+
terminate a thread that is running code not specifically designed to support
5+
termination. In most cases this hanging is harmless since the process will
6+
soon exit anyway.
7+
8+
The ``PyThread_exit_thread`` function is now deprecated. Its behavior is
9+
inconsistent across platforms, and it can only be used safely in the
10+
unlikely case that every function in the entire call stack has been designed
11+
to support the platform-dependent termination mechanism. It is recommended
12+
that users of this function change their design to not require thread
13+
termination. In the unlikely case that thread termination is needed and can
14+
be done safely, users may migrate to calling platform-specific APIs such as
15+
``pthread_exit`` (POSIX) or ``_endthreadex`` (Windows) directly.

Diff for: Modules/_testcapimodule.c

+30
Original file line numberDiff line numberDiff line change
@@ -3310,6 +3310,35 @@ test_critical_sections(PyObject *module, PyObject *Py_UNUSED(args))
33103310
Py_RETURN_NONE;
33113311
}
33123312

3313+
// Used by `finalize_thread_hang`.
3314+
#ifdef _POSIX_THREADS
3315+
static void finalize_thread_hang_cleanup_callback(void *Py_UNUSED(arg)) {
3316+
// Should not reach here.
3317+
Py_FatalError("pthread thread termination was triggered unexpectedly");
3318+
}
3319+
#endif
3320+
3321+
// Tests that finalization does not trigger pthread cleanup.
3322+
//
3323+
// Must be called with a single nullary callable function that should block
3324+
// (with GIL released) until finalization is in progress.
3325+
static PyObject *
3326+
finalize_thread_hang(PyObject *self, PyObject *callback)
3327+
{
3328+
// WASI builds some pthread stuff but doesn't have these APIs today?
3329+
#if defined(_POSIX_THREADS) && !defined(__wasi__)
3330+
pthread_cleanup_push(finalize_thread_hang_cleanup_callback, NULL);
3331+
#endif
3332+
PyObject_CallNoArgs(callback);
3333+
// Should not reach here.
3334+
Py_FatalError("thread unexpectedly did not hang");
3335+
#if defined(_POSIX_THREADS) && !defined(__wasi__)
3336+
pthread_cleanup_pop(0);
3337+
#endif
3338+
Py_RETURN_NONE;
3339+
}
3340+
3341+
33133342
static PyMethodDef TestMethods[] = {
33143343
{"set_errno", set_errno, METH_VARARGS},
33153344
{"test_config", test_config, METH_NOARGS},
@@ -3449,6 +3478,7 @@ static PyMethodDef TestMethods[] = {
34493478
{"test_weakref_capi", test_weakref_capi, METH_NOARGS},
34503479
{"function_set_warning", function_set_warning, METH_NOARGS},
34513480
{"test_critical_sections", test_critical_sections, METH_NOARGS},
3481+
{"finalize_thread_hang", finalize_thread_hang, METH_O, NULL},
34523482
{NULL, NULL} /* sentinel */
34533483
};
34543484

Diff for: Python/ceval_gil.c

+17-9
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include "pycore_pylifecycle.h" // _PyErr_Print()
88
#include "pycore_pymem.h" // _PyMem_IsPtrFreed()
99
#include "pycore_pystats.h" // _Py_PrintSpecializationStats()
10+
#include "pycore_pythread.h" // PyThread_hang_thread()
1011

1112
/*
1213
Notes about the implementation:
@@ -277,10 +278,9 @@ drop_gil(PyInterpreterState *interp, PyThreadState *tstate, int final_release)
277278
/* Take the GIL.
278279
279280
The function saves errno at entry and restores its value at exit.
281+
It may hang rather than return if the interpreter has been finalized.
280282
281-
tstate must be non-NULL.
282-
283-
Returns 1 if the GIL was acquired, or 0 if not. */
283+
tstate must be non-NULL. */
284284
static void
285285
take_gil(PyThreadState *tstate)
286286
{
@@ -293,12 +293,18 @@ take_gil(PyThreadState *tstate)
293293

294294
if (_PyThreadState_MustExit(tstate)) {
295295
/* bpo-39877: If Py_Finalize() has been called and tstate is not the
296-
thread which called Py_Finalize(), exit immediately the thread.
296+
thread which called Py_Finalize(), this thread cannot continue.
297297
298298
This code path can be reached by a daemon thread after Py_Finalize()
299299
completes. In this case, tstate is a dangling pointer: points to
300-
PyThreadState freed memory. */
301-
PyThread_exit_thread();
300+
PyThreadState freed memory.
301+
302+
This used to call a *thread_exit API, but that was not safe as it
303+
lacks stack unwinding and local variable destruction important to
304+
C++. gh-87135: The best that can be done is to hang the thread as
305+
the public APIs calling this have no error reporting mechanism (!).
306+
*/
307+
PyThread_hang_thread();
302308
}
303309

304310
assert(_PyThreadState_CheckConsistency(tstate));
@@ -342,7 +348,9 @@ take_gil(PyThreadState *tstate)
342348
if (drop_requested) {
343349
_Py_unset_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);
344350
}
345-
PyThread_exit_thread();
351+
// gh-87135: hang the thread as *thread_exit() is not a safe
352+
// API. It lacks stack unwind and local variable destruction.
353+
PyThread_hang_thread();
346354
}
347355
assert(_PyThreadState_CheckConsistency(tstate));
348356

@@ -383,7 +391,7 @@ take_gil(PyThreadState *tstate)
383391

384392
if (_PyThreadState_MustExit(tstate)) {
385393
/* bpo-36475: If Py_Finalize() has been called and tstate is not
386-
the thread which called Py_Finalize(), exit immediately the
394+
the thread which called Py_Finalize(), gh-87135: hang the
387395
thread.
388396
389397
This code path can be reached by a daemon thread which was waiting
@@ -393,7 +401,7 @@ take_gil(PyThreadState *tstate)
393401
/* tstate could be a dangling pointer, so don't pass it to
394402
drop_gil(). */
395403
drop_gil(interp, NULL, 1);
396-
PyThread_exit_thread();
404+
PyThread_hang_thread();
397405
}
398406
assert(_PyThreadState_CheckConsistency(tstate));
399407

Diff for: Python/pylifecycle.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -2020,7 +2020,7 @@ _Py_Finalize(_PyRuntimeState *runtime)
20202020
/* Ensure that remaining threads are detached */
20212021
_PyEval_StopTheWorldAll(runtime);
20222022

2023-
/* Remaining daemon threads will automatically exit
2023+
/* Remaining daemon threads will be trapped in PyThread_hang_thread
20242024
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
20252025
_PyInterpreterState_SetFinalizing(tstate->interp, tstate);
20262026
_PyRuntimeState_SetFinalizing(runtime, tstate);

Diff for: Python/thread_nt.h

+8
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,14 @@ PyThread_exit_thread(void)
291291
_endthreadex(0);
292292
}
293293

294+
void _Py_NO_RETURN
295+
PyThread_hang_thread(void)
296+
{
297+
while (1) {
298+
SleepEx(INFINITE, TRUE);
299+
}
300+
}
301+
294302
/*
295303
* Lock support. It has to be implemented as semaphores.
296304
* I [Dag] tried to implement it with mutex but I could find a way to

0 commit comments

Comments
 (0)