Skip to content

GIL is held for the entire duration of cursor.execute(), starving other threads / asyncio event loop #540

@StrayCatDBA

Description

@StrayCatDBA

Describe the bug

mssql-python holds the Python GIL for the entire duration of a query
executed via cursor.execute(...). Other Python threads — including the
asyncio event loop on the main thread when the query is dispatched via
asyncio.to_thread — are completely starved while the driver waits on the
network round-trip to SQL Server.

This is a regression versus pyodbc, which releases the GIL around the
blocking ODBC calls (SQLExecute, SQLExecDirect, SQLFetch, ...). Any
user combining mssql-python with asyncio.to_thread,
concurrent.futures.ThreadPoolExecutor, FastAPI/Starlette background
threads, a Flask threaded server, or even a simple background heartbeat
thread will see the entire process freeze for the duration of every query —
even though the query is dispatched to a worker thread specifically to
avoid that.

No exception is raised; the symptom is a stalled event loop / unresponsive
threads.

To reproduce

Self-contained repro using asyncio.to_thread and WAITFOR DELAY so no
schema or table is needed. The main thread runs an asyncio "heartbeat"
coroutine that prints a tick every 2 seconds; the query is dispatched to a
worker thread. If the GIL were released around the ODBC call, the heartbeat
would keep ticking. It does not — it freezes for the full 10 seconds.

"""
Minimal repro: mssql-python holds the GIL for the entire duration of a query.

Setup:
    pip install mssql-python
    set DB_CONNECTION_STRING=Server=...;Database=...;UID=...;PWD=...;Encrypt=yes;

Run:
    python repro_gil_held.py
"""

import asyncio
import os
import time

import mssql_python

QUERY_DELAY_SECONDS = 10
HEARTBEAT_INTERVAL = 2.0


def run_blocking_query() -> None:
    """Runs in a worker thread via asyncio.to_thread."""
    conn_str = os.environ["DB_CONNECTION_STRING"]
    conn = mssql_python.connect(conn_str)
    try:
        cursor = conn.cursor()
        t0 = time.monotonic()
        print(f"[query  ] {t0:8.3f} starting WAITFOR DELAY '00:00:{QUERY_DELAY_SECONDS:02d}'")
        cursor.execute(f"WAITFOR DELAY '00:00:{QUERY_DELAY_SECONDS:02d}'")
        cursor.fetchall() if cursor.description else None
        t1 = time.monotonic()
        print(f"[query  ] {t1:8.3f} query returned (elapsed {t1 - t0:.3f}s)")
        cursor.close()
    finally:
        conn.close()


async def heartbeat(stop: asyncio.Event) -> None:
    """Prints a tick every HEARTBEAT_INTERVAL seconds on the event loop thread."""
    last = time.monotonic()
    while not stop.is_set():
        now = time.monotonic()
        gap = now - last
        marker = "  <-- STALLED (GIL held by query thread)" if gap > HEARTBEAT_INTERVAL * 1.5 else ""
        print(f"[hbeat  ] {now:8.3f} tick (gap {gap:5.3f}s){marker}")
        last = now
        try:
            await asyncio.wait_for(stop.wait(), timeout=HEARTBEAT_INTERVAL)
        except asyncio.TimeoutError:
            pass


async def main() -> None:
    if "DB_CONNECTION_STRING" not in os.environ:
        raise SystemExit("Set DB_CONNECTION_STRING env var first.")

    stop = asyncio.Event()
    hb_task = asyncio.create_task(heartbeat(stop))

    await asyncio.sleep(2 * HEARTBEAT_INTERVAL)
    await asyncio.to_thread(run_blocking_query)
    await asyncio.sleep(2 * HEARTBEAT_INTERVAL)

    stop.set()
    await hb_task


if __name__ == "__main__":
    print(f"mssql_python version: {getattr(mssql_python, '__version__', 'unknown')}")
    asyncio.run(main())

Observed output (1.6.0):

mssql_python version: 1.6.0
[hbeat  ]    0.000 tick (gap 0.000s)
[hbeat  ]    2.001 tick (gap 2.001s)
[query  ]    4.003 starting WAITFOR DELAY '00:00:10'
[hbeat  ]   14.012 tick (gap 12.011s)  <-- STALLED (GIL held by query thread)
[query  ]   14.014 query returned (elapsed 10.011s)
[hbeat  ]   16.014 tick (gap 2.002s)

The 12-second gap between heartbeat ticks is the smoking gun: the asyncio
event loop on the main thread cannot run at all while a worker thread is
inside cursor.execute(...).

Expected behavior

The blocking ODBC calls should release the GIL around the
network/server-bound portion of the call, the way pyodbc does. The
heartbeat in the repro should continue to fire every ~2 s while the query
runs:

[hbeat  ]    0.000 tick (gap 0.000s)
[hbeat  ]    2.001 tick (gap 2.001s)
[query  ]    4.003 starting WAITFOR DELAY '00:00:10'
[hbeat  ]    4.001 tick (gap 2.000s)
[hbeat  ]    6.001 tick (gap 2.000s)
[hbeat  ]    8.001 tick (gap 2.000s)
[hbeat  ]   10.001 tick (gap 2.000s)
[hbeat  ]   12.002 tick (gap 2.001s)
[hbeat  ]   14.002 tick (gap 2.000s)
[query  ]   14.014 query returned (elapsed 10.011s)

Further technical details

Python version: 3.13.1 (CPython, 64-bit)
SQL Server version: SQL Server 2022
Operating system: Windows
mssql-python version: 1.6.0 (latest at time of filing)

Additional context

Root cause (from reading mssql_python/pybind/)

A grep across the entire pybind/ tree shows the GIL is released in only
two places, both in connection/connection.cpp:

$ rg "gil_scoped_release|call_guard|Py_BEGIN_ALLOW_THREADS" mssql_python/pybind
mssql_python/pybind/connection/connection.cpp:94    SQLDriverConnect
mssql_python/pybind/connection/connection.cpp:156   SQLDisconnect

ddbc_bindings.cpp (~288 KB, contains every statement-level wrapper)
contains zero gil_scoped_release and zero
py::call_guard<py::gil_scoped_release>(). As a result, every blocking
statement-level ODBC call holds the GIL across the network round-trip:

Wrapper C++ function ODBC call Approx. line in ddbc_bindings.cpp
DDBCSQLExecDirect SQLExecDirect_wrap SQLExecDirectW 1705
DDBCSQLExecute SQLExecute_wrap SQLExecute (+ DAE: SQLParamData, SQLPutData) 1881
SQLExecuteMany SQLExecuteMany_wrap repeated SQLExecute (+ DAE) 2750, 2770
DDBCSQLFetch SQLFetch_wrap SQLFetch 2944
DDBCSQLFetchOne/Many/All FetchOne_wrap / FetchMany_wrap / FetchAll_wrap SQLFetch / SQLFetchScroll (+ SQLGetData for LOBs) 5483+, 5587, 5621+
DDBCSQLFetchScroll SQLFetchScroll_wrap SQLFetchScroll (registered at 5810)
DDBCSQLMoreResults SQLMoreResults_wrap SQLMoreResults 5649
DDBCSQLGetData FetchLobColumnData SQLGetData (LOB streaming loop) 2958
DDBCSQLTables/Columns/PrimaryKeys/ForeignKeys/Procedures/SpecialColumns/Statistics/GetTypeInfo catalog wrappers corresponding catalog ODBC calls — all blocking 5806–5862
Connection.commit / rollback (in connection.cpp) SQLEndTran declared at 354

Suggested fix

The maintainers clearly know the pattern — it's already used correctly for
SQLDriverConnect and SQLDisconnect in
mssql_python/pybind/connection/connection.cpp line 94:

SQLRETURN ret;
{
    py::gil_scoped_release release;
    ret = SQLDriverConnect_ptr(_dbcHandle->get(), nullptr, connStrPtr,
                               SQL_NTS, nullptr, 0, nullptr,
                               SQL_DRIVER_NOPROMPT);
}

The same pattern needs to be applied to every blocking ODBC call in
ddbc_bindings.cpp. Two equivalent options:

Option A — explicit scope (required for wrappers that build Python
objects, e.g. the Fetch*_wrap family and SQLExecute_wrap's DAE loop,
which interleave Python-object construction with ODBC calls):

SQLRETURN ret;
{
    py::gil_scoped_release release;
    ret = SQLExecDirect_ptr(StatementHandle->get(), queryPtr, SQL_NTS);
}

Option B — pybind11 call_guard on the binding (only safe for wrappers
that do not touch any Python object between entry and the ODBC call, e.g.
SQLFetch_wrap, SQLMoreResults_wrap):

m.def("DDBCSQLFetch",       &SQLFetch_wrap,       "...",
      py::call_guard<py::gil_scoped_release>());
m.def("DDBCSQLMoreResults", &SQLMoreResults_wrap, "...",
      py::call_guard<py::gil_scoped_release>());

Releasing the GIL around just SQLExecute_wrap, SQLExecDirect_wrap, and
the Fetch*_wrap family would resolve the vast majority of user-visible
contention, including this repro.

Metadata

Metadata

Assignees

Labels

triage doneIssues that are triaged by dev team and are in investigation.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions