Add async task background worker #4591

srothh · 2025-07-17T13:45:49Z

Add a new implementation of the transport background worker based on an async task. This worker mostly mirrors the same functionality as the thread-based worker, with the exception that it exposes a non-blocking async flush (which can be awaited from an async context). Furthermore, the worker itself is not thread-safe and should be called using run_coroutine_threadsafe or similar when called from another thread (this is fixed and handled by the transport). I have kept the fork check from the threaded worker, but I am not sure if it is necessary as forking in an async application would also break the event loop.

GH-4581

codecov · 2025-07-17T13:49:46Z

❌ 54 Tests Failed:

Tests completed	Failed	Passed	Skipped
21089	54	21035	1098

View the top 3 failed test(s) by shortest run time

tests.integrations.huggingface_hub.test_huggingface_hub::test_bad_chat_completion

Stack Traces | 0.124s run time

.../integrations/huggingface_hub/test_huggingface_hub.py:149: in test_bad_chat_completion
    client.text_generation(prompt="hello")
sentry_sdk/integrations/huggingface_hub.py:84: in new_text_generation
    raise e from None
sentry_sdk/integrations/huggingface_hub.py:80: in new_text_generation
    res = f(*args, **kwargs)
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../huggingface_hub/inference/_client.py:2351: in text_generation
    request_parameters = provider_helper.prepare_request(
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../inference/_providers/_common.py:64: in prepare_request
    mapped_model = self._prepare_mapped_model(model)
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../inference/_providers/hf_inference.py:35: in _prepare_mapped_model
    _check_supported_task(model_id, self.task)
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../inference/_providers/hf_inference.py:164: in _check_supported_task
    raise ValueError(
E   ValueError: Model 'mistralai/Mistral-Nemo-Instruct-2407' doesn't support task 'text-generation'. Supported tasks: 'None', got: 'text-generation'

tests.integrations.huggingface_hub.test_huggingface_hub::test_bad_chat_completion

Stack Traces | 0.124s run time

.../integrations/huggingface_hub/test_huggingface_hub.py:149: in test_bad_chat_completion
    client.text_generation(prompt="hello")
sentry_sdk/integrations/huggingface_hub.py:84: in new_text_generation
    raise e from None
sentry_sdk/integrations/huggingface_hub.py:80: in new_text_generation
    res = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
.tox/py3.12-huggingface_hub-v0.33.4/lib/python3.12.../huggingface_hub/inference/_client.py:2297: in text_generation
    request_parameters = provider_helper.prepare_request(
.tox/py3.12-huggingface_hub-v0.33.4/lib/python3.12.../inference/_providers/_common.py:93: in prepare_request
    provider_mapping_info = self._prepare_mapping_info(model)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.tox/py3.12-huggingface_hub-v0.33.4/lib/python3.12.../inference/_providers/hf_inference.py:38: in _prepare_mapping_info
    _check_supported_task(model_id, self.task)
.tox/py3.12-huggingface_hub-v0.33.4/lib/python3.12.../inference/_providers/hf_inference.py:187: in _check_supported_task
    raise ValueError(
E   ValueError: Model 'mistralai/Mistral-Nemo-Instruct-2407' doesn't support task 'text-generation'. Supported tasks: 'None', got: 'text-generation'

tests.integrations.huggingface_hub.test_huggingface_hub::test_nonstreaming_chat_completion[False-True-False]

Stack Traces | 0.125s run time

.../integrations/huggingface_hub/test_huggingface_hub.py:56: in test_nonstreaming_chat_completion
    response = client.text_generation(
sentry_sdk/integrations/huggingface_hub.py:84: in new_text_generation
    raise e from None
sentry_sdk/integrations/huggingface_hub.py:80: in new_text_generation
    res = f(*args, **kwargs)
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../huggingface_hub/inference/_client.py:2351: in text_generation
    request_parameters = provider_helper.prepare_request(
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../inference/_providers/_common.py:64: in prepare_request
    mapped_model = self._prepare_mapped_model(model)
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../inference/_providers/hf_inference.py:35: in _prepare_mapped_model
    _check_supported_task(model_id, self.task)
.tox/py3.8-huggingface_hub-v0.30.2/lib/python3.8.../inference/_providers/hf_inference.py:164: in _check_supported_task
    raise ValueError(
E   ValueError: Model 'mistralai/Mistral-Nemo-Instruct-2407' doesn't support task 'text-generation'. Supported tasks: 'None', got: 'text-generation'

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

…ted a sync transport HTTP subclass Moved shared sync/async logic into a new superclass (HttpTransportCore), and moved sync transport specific code into a new subclass(BaseSyncHttpTransport), from which the current transport implementations inherit Fixes GH-4568

Removed an unnecessary TODO message and reverted a class name change for BaseHTTPTransport. GH-4568

Adds test coverage for the error handling path when HTTP requests return error status codes. GH-4568

Restore comments accidentally removed during a previous commit.

Refactored class names such that BaseHttpTransport now has the same functionality as before the hierarchy refactor GH-4568

Add a new flush_async method in the Transport ABC. This is needed for the async transport, as calling it from the client while preserving execution order in close will require flush to be a coroutine, not a function. GH-4568

Move flush_async down to the specific async transport subclass. This makes more sense anyway, as this will only be required by the async transport. If more async transports are expected, another shared superclass can be created. GH-4568

Add necessary type annotations to the core HttpTransport to accomodate for async transport. GH-4568

Add an abstract bass class for implementation of the background worker. This was done to provide a shared interface for the current implementation of a threaded worker in the sync context as well as the upcoming async task-based worker implementation. GH-4578

Add a new factory method instead of direct instatiation of the threaded background worker. This allows for easy extension to other types of workers, such as the upcoming task-based async worker. GH-4578

Add a new flush_async method to worker ABC. This is necessary because the async transport cannot use a synchronous blocking flush. GH-4578

Move the flush_async down to the concrete subclass to not break existing testing. This makes sense, as this will only really be needed by the async worker anyway and therefore is not shared logic. GH-4578

Coroutines have a return value, however the current function signature for the worker methods does not accomodate for this. Therefore, this signature was changed. GH-4578

Add a new implementation of the worker interface, implementing the worker as an async task. This is to be used by the upcoming async transport. GH-4581

Refactor the flush method in the async worker to use the async_flush coroutine. GH-4581

…unctions Add a check to see wheter callbacks are awaitable coroutines or functions, as coroutines need to be awaited. GH-4581

…coroutines Coroutines do not return None, therefore it is necessary to consider this in the callback parameter of the worker. Previously, only callbacks with return Type None were accepted. GH-4581

GH-4581

Enable concurrent callbacks on async task worker by firing them as a task rather than awaiting them. A done callback handles the necessary queue and exception logic. GH-4581

Changed kill to also use the _TERMINATOR sentinel, so the queue is still drained to this point on kill instead of cancelled immediately. This should also fix potential race conditions with flush_async. GH-4581

Add proper type annotation to worker task list to fix linting problems GH-4581

antonpirker

Really great work. I have some comments for improvement.

antonpirker · 2025-07-24T13:13:39Z

sentry_sdk/worker.py

+            try:
+                self._queue.put_nowait(_TERMINATOR)
+            except asyncio.QueueFull:
+                logger.debug("async worker queue full, kill failed")


Could we use the full() method from below? This way we would only have one way to check if the queue is full.

I did it this way because the threaded worker does the same, but I think there should be no functional difference, so yes it is probably nicer!

antonpirker · 2025-07-24T13:24:04Z

sentry_sdk/worker.py

+                pending = self._queue.qsize() + 1
+                logger.error("flush timed out, dropped %s events", pending)
+
+    async def flush_async(self, timeout: float, callback: Optional[Any] = None) -> None:


Why is this called flush_async and not flush? Shouldnt there be a flush() that maybe calls flush_async so all the workers can be used the same way?

This is how I had it initially. However, when integrating this with the SDK there is an issue:

The client exposes a synchronous flush/close method, which in close expects flush to fully complete before shutting down the transport. If the worker/transport used the synchronous flush method, they could only create an async flush task (which is necessary to avoid deadlocking/use the async queue) in a fire-and-forget way. This means that there is not really a way to properly order the flush with the shutdown in the client, unless the client itself already spawns an async flush task using the transport.

I could rename it back to flush and keep it a coroutine, I just thought this way it is less confusing.

antonpirker · 2025-07-24T13:26:03Z

sentry_sdk/worker.py

+
+    async def flush_async(self, timeout: float, callback: Optional[Any] = None) -> None:
+        logger.debug("background worker got flush request")
+        if self.is_alive and timeout > 0.0:


I know this is also in the BackgroundWorker implementation, but do you know why a timeout of 0.0 should not flush the worker?

I am not really sure. But the client falls back to a "shutdown_timeout" option if the timeout is not set. Maybe setting shutdown_timeout to 0.0 is supposed to immediately quit the application on a call to close ?

antonpirker · 2025-07-24T13:30:23Z

sentry_sdk/worker.py

+            # Create a strong reference to the task so it can be cancelled on kill
+            # and does not get garbage collected while running
+            self._active_tasks.add(task)
+            task.add_done_callback(self._on_task_complete)


Maybe we should add the done callback before adding it to _active_tasks. (Just a gut feeling that the task might be finishing before the callback is in place..)

I actually oriented myself on the official documentation for the add_done_callback for this. However, if I understand it correctly I do not think it matters, as the task can only start once the event loop yields control from the current function, which should only happen on await/return. If it is more readable, I can change it however.

antonpirker · 2025-07-24T14:01:14Z

sentry_sdk/worker.py

+        try:
+            await asyncio.wait_for(self._queue.join(), timeout=initial_timeout)
+        except asyncio.TimeoutError:
+            pending = self._queue.qsize() + 1


Because there could be multiple tasks in self._active_tasks maybe the more correct version would be self.queue.qsize() + len(self._active_tasks)?

So pending should be not just the number still waiting in the queue, but the number of non-completed tasks in general ? I think this makes sense, thanks!

antonpirker · 2025-07-24T14:03:34Z

sentry_sdk/worker.py

+                self._task = self._loop.create_task(self._target())
+                self._task_for_pid = os.getpid()
+            except RuntimeError:
+                # There is no event loop running


I think we should at least log a warning here, so we do not swallow failures silently.

antonpirker · 2025-07-24T14:24:15Z

sentry_sdk/worker.py

+            await callback()
+        else:
+            # Callback is a sync function, need to call it
+            callback()


this will block the event loop. Maybe we should do await asyncio.to_thread(callback) here? @sl0thentr0py what is your take on this? Blocking or creating threads?

just make a fully async version first, do we really need to mix sync and async? we can patch that on later if really necessary. Please keep it simple for now.

After checking again, I agree that this is not needed. I confused this with the flush callbacks when I initially explored the SDK, which are synchronous functions. Those are however, processed in the flush method, and I believe for this there should only be the async requests currently. Thanks for catching that!

antonpirker · 2025-07-24T14:34:58Z

sentry_sdk/worker.py

+
+class AsyncWorker(Worker):
+    def __init__(self, queue_size: int = DEFAULT_QUEUE_SIZE) -> None:
+        self._queue: asyncio.Queue[Any] = asyncio.Queue(queue_size)


I think in older Pythons (3.7-3.9) it can be a problem if a Queue is initialized when there is no current event loop. I would not init it here, but in the start() after we got the event loop. something like:

# in start() self._loop = asyncio.get_running_loop() if self._queue is None: self._queue = asyncio.Queue(self._queue_size)

Thank you, I was not aware of this !

antonpirker · 2025-07-24T14:43:42Z

sentry_sdk/worker.py

+            pending = self._queue.qsize() + 1
+            logger.debug("%d event(s) pending on flush", pending)
+            if callback is not None:
+                callback(pending, timeout)


This will block the entire event loop. I guess the callback for the AsyncCorker should always be a coroutine that can be awaited.

From what I can tell, this functionality is used only by the atexit of the SDK, which from what I can tell only uses this with synchronous functions currently. If the blocking is a problem, run_in_executor could be used ? But as this only happens once on exit currently, I am not sure if this is an issue.

antonpirker · 2025-07-24T14:46:38Z

sentry_sdk/worker.py

+            self._active_tasks.add(task)
+            task.add_done_callback(self._on_task_complete)
+            # Yield to let the event loop run other tasks
+            await asyncio.sleep(0)


I think this is not necessary, because the callback = await self._queue.get() in the loop also gives up control.

I am not 100% sure about this, but if there is instantly items in the queue, does the await also give up control? If not, it might have the same issue, but I am not sure

srothh added 7 commits July 21, 2025 11:44

ref(transport) Removed Todo and reverted class name change

666ff3a

Removed an unnecessary TODO message and reverted a class name change for BaseHTTPTransport. GH-4568

test(transport): Add test for HTTP error status handling

748764e

Adds test coverage for the error handling path when HTTP requests return error status codes. GH-4568

test(transport): Restore accidentally removed comments

ee6dbee

Restore comments accidentally removed during a previous commit.

ref(transport) Refactor class names to reflect previous functionality

19405fd

Refactored class names such that BaseHttpTransport now has the same functionality as before the hierarchy refactor GH-4568

ref(transport): Add flush_async in the Transport abc

3736c03

Add a new flush_async method in the Transport ABC. This is needed for the async transport, as calling it from the client while preserving execution order in close will require flush to be a coroutine, not a function. GH-4568

ref(transport): Move flush_async from ABC

3607d44

Move flush_async down to the specific async transport subclass. This makes more sense anyway, as this will only be required by the async transport. If more async transports are expected, another shared superclass can be created. GH-4568

srothh force-pushed the srothh/worker-class-hierarchy branch from 1261319 to 2896602 Compare July 21, 2025 09:58

srothh force-pushed the srothh/async-task-worker branch from 7ada4b3 to 1a129f7 Compare July 21, 2025 10:42

srothh added 6 commits July 23, 2025 15:51

ref(transport): add async type annotations to HTTPTransportCore

0ba5a83

Add necessary type annotations to the core HttpTransport to accomodate for async transport. GH-4568

ref(transport): Add _create_worker factory method to Transport

a81487e

Add a new factory method instead of direct instatiation of the threaded background worker. This allows for easy extension to other types of workers, such as the upcoming task-based async worker. GH-4578

ref(worker): Add flush_async method to Worker ABC

8960e6f

Add a new flush_async method to worker ABC. This is necessary because the async transport cannot use a synchronous blocking flush. GH-4578

ref(worker): Move worker flush_async from Worker ABC

0f7937b

Move the flush_async down to the concrete subclass to not break existing testing. This makes sense, as this will only really be needed by the async worker anyway and therefore is not shared logic. GH-4578

ref(worker): Amend function signature for coroutines

268ea1a

Coroutines have a return value, however the current function signature for the worker methods does not accomodate for this. Therefore, this signature was changed. GH-4578

srothh force-pushed the srothh/worker-class-hierarchy branch from 2896602 to 268ea1a Compare July 23, 2025 14:02

srothh added 9 commits July 23, 2025 16:03

feat(transport): Add an async task-based worker for transport

b3c05cc

Add a new implementation of the worker interface, implementing the worker as an async task. This is to be used by the upcoming async transport. GH-4581

ref(worker): Make worker work with new ABC interface

fb0ad18

Refactor the flush method in the async worker to use the async_flush coroutine. GH-4581

fix(worker): Check if callbacks from worker queue are coroutines or f…

7edbbaf

…unctions Add a check to see wheter callbacks are awaitable coroutines or functions, as coroutines need to be awaited. GH-4581

ref(worker): Amend return type of submit and flush to accomodate for …

0f63d24

…coroutines Coroutines do not return None, therefore it is necessary to consider this in the callback parameter of the worker. Previously, only callbacks with return Type None were accepted. GH-4581

ref(worker): Add type parameters for AsyncWorker variables

2430e2e

GH-4581

ref(worker): Remove loop upon killing worker

96fcd85

GH-4581

feat(worker): Enable concurrent callbacks on async task worker

331e40b

Enable concurrent callbacks on async task worker by firing them as a task rather than awaiting them. A done callback handles the necessary queue and exception logic. GH-4581

fix(worker): Modify kill behaviour to mirror threaded worker

5f67485

Changed kill to also use the _TERMINATOR sentinel, so the queue is still drained to this point on kill instead of cancelled immediately. This should also fix potential race conditions with flush_async. GH-4581

ref(worker): add proper type annotation to worker task list

97c5e3d

Add proper type annotation to worker task list to fix linting problems GH-4581

srothh force-pushed the srothh/async-task-worker branch from 1a129f7 to 97c5e3d Compare July 23, 2025 14:04

srothh marked this pull request as ready for review July 24, 2025 07:49

srothh requested a review from a team as a code owner July 24, 2025 07:49

antonpirker requested changes Jul 24, 2025

View reviewed changes

srothh force-pushed the srothh/worker-class-hierarchy branch from 1fbf85f to ef780f3 Compare July 28, 2025 08:55

Add async task background worker #4591

Are you sure you want to change the base?

Add async task background worker #4591

Uh oh!

Conversation

srothh commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 54 Tests Failed:

Uh oh!

antonpirker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

srothh commented Jul 17, 2025 •

edited

Loading

codecov bot commented Jul 17, 2025 •

edited

Loading