-
Notifications
You must be signed in to change notification settings - Fork 393
Remove sentinel
logcontext in Clock
utilities (looping_call
, looping_call_now
, call_later
)
#18907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove sentinel
logcontext in Clock
utilities (looping_call
, looping_call_now
, call_later
)
#18907
Conversation
We will need this in #18868 but not for now
Clock
utilities (looping_call
, looping_call_now
, call_later
)sentinel
logcontext in Clock
utilities (looping_call
, looping_call_now
, call_later
)
# test is done once d2 finishes | ||
return d2 | ||
|
||
@logcontext_clean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added @logcontext_clean
everywhere as it makes sense to fail when we see logcontext_error
used in the logcontext tests.
Interestingly, I do still see some logcontext_error
(ex. 2025-09-10 20:20:17-0500 [-] synapse.logging.context - 90 - WARNING - one - Re-starting finished log context one
) logs when running these run_in_background
test logs but the tests still aren't failing 🤔
These logs also happen on develop
(even develop
from days ago before any logcontext changes) so this is a long-standing problem.
Something to solve in an another PR ⏩
synapse/util/__init__.py
Outdated
with context.PreserveLoggingContext(): | ||
callback(*args, **kwargs) | ||
|
||
with context.PreserveLoggingContext(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PreserveLoggingContext()
doesn't seem necessary but please double-check. We aren't calling anything at this point. Only scheduling something for the reactor to run later
Unless we think that a reactor implementation might call immediately when delay
is 0
(reactor.callLater(0, func)
). Although, I'm assuming this has the same semantics as setTimeout(code, 0)
in JavaScript where it should be run as soon as possible but not right away (put it on the queue).
Please cross-check with whether the test_call_later
is good enough to stress this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fair to me. I can't rule out some weird non-obvious interaction but if that was the case, there should've been a comment to that extent.
synapse/util/__init__.py
Outdated
assert context.current_context() is context.SENTINEL_CONTEXT, ( | ||
"Expected `call_later` callback from the reactor to start with the sentinel logcontext " | ||
f"but saw {context.current_context()}. In other words, another task shouldn't have " | ||
"leaked their logcontext to us." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ok with breaking the world when we run into this kind of problem?
It would be much nicer to catch this and get a report (what we're doing now) than logging in the background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this has already caught the problem in our email pusher tests where we were leaking logcontexts -> 9116e74
self.assertTrue( | ||
callback_finished, | ||
"Callback never finished which means the test probably didn't wait long enough", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to ensure we're actually getting through everything as expected.
Previously, we didn't run through the callback completely and missed the competing
assertion for example.
d: Deferred = Deferred() | ||
self.email_attempts.append((d, args, kwargs)) | ||
return d | ||
return make_deferred_yieldable(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were leaking the logcontext into the reactor
This took some debug diving to find the culprit but turns out we already had the same fix in the HTTP pusher tests since a267c2e (matrix-org/synapse#4204)
Previously, this was causing some tests to fail:
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.push.test_email.EmailPusherTests.test_invite_to_empty_room_sends_email
_trial_temp/test.log
2025-09-11 01:01:23-0500 [-] synapse.util - 308 - ERROR - emailpush.process-2 - Looping call died
Traceback (most recent call last):
File "/virtualenvs/matrix-synapse-xCtC9ulO-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 216, in maybeDeferred
result = f(*args, **kwargs)
File "synapse/synapse/util/__init__.py", line 202, in wrapped_f
assert context.current_context() is context.SENTINEL_CONTEXT, (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Expected `looping_call` callback from the reactor to start with the sentinel logcontext but saw emailpush.process-2. In other words, another task shouldn't have leaked their logcontext to us.
We caught this because we added that new sentinel
logcontext assertion (discussed in another thread above)
See #18907 (comment) Prior art here uses `one` here but I find it difficult to write about in a comment. For example, "in one context" is kinda generic sounding when I'm talking about the specific context.
See #18907 (comment) > We should wrap this in a `try: ... finally: callback_finished = True` so we can see any potential underlying error that occurs in the `competing_callback()`
synapse/util/__init__.py
Outdated
Note that the function will be called with generic `looping_call` logcontext, so | ||
if it is anything other than a trivial task, you probably want to wrap it in | ||
`run_as_background_process` to give it more specific label and track metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not your problem to deal with, but I've often wondered why we don't just make looping_call
take a name and do this wrapping itself. I can't think of a generic looping_call
ctx ever being the desirable outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible future 👍
synapse/util/__init__.py
Outdated
with context.PreserveLoggingContext(): | ||
callback(*args, **kwargs) | ||
|
||
with context.PreserveLoggingContext(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fair to me. I can't rule out some weird non-obvious interaction but if that was the case, there should've been a comment to that extent.
Conflicts: synapse/util/__init__.py
See #18914 for more docs on how deferreds interact with logcontexts
Thanks for the review @reivilibre 🦔 |
Remove
sentinel
logcontext inClock
utilities (looping_call
,looping_call_now
,call_later
)Part of #18905
Lints for ensuring we use
Clock.call_later
instead ofreactor.callLater
, etc are coming in #18944Testing strategy
DEBUG
levelpoetry run synapse_homeserver --config-path homeserver.yaml
Total database time
log lineBefore (
sentinel
):After (
looping_call
):Dev notes
Other related fixes matrix-org/synapse#4209
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.