Skip to content

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Sep 19, 2025

This PR can be reviewed commit-by-commit.

Introduce Clock.call_when_running(...) to wrap startup code in a logcontext, ensuring we can identify which server generated the logs.

Background:

Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
logcontext as we want to know which server the logs came from. In practice, this is not
always the case yet especially outside of request handling.
Global things outside of Synapse (e.g. Twisted reactor code) should run in the
`sentinel` logcontext. It's only when it calls into application code that a logcontext
gets activated. This means the reactor should be started in the `sentinel` logcontext,
and any time an awaitable yields control back to the reactor, it should reset the
logcontext to be the `sentinel` logcontext. This is important to avoid leaking the
current logcontext to the reactor (which would then get picked up and associated with
the next thing the reactor does).

Also adds a lint to prefer Clock.call_when_running(...) over reactor.callWhenRunning(...)

Part of #18905

Dev notes

Originally, we covered Synapse startup by manually adding a logcontext, see #18870

Adding logcontext to looping_call and call_later in #18907

Mypy lint pattern came from #18828

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@@ -0,0 +1 @@
Introduce `Clock.call_when_running(...)` to wrap startup code in a logcontext, ensuring we can identify which server generated the logs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR can be reviewed commit-by-commit. There are lots of changes because I split out JSON and Clock utilities to avoid circular imports.

return None


def check_call_when_running(ctx: MethodSigContext) -> CallableType:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lint pattern is coming from #18828 where we're also going to lint call_later and looping_call in the future.

if not ignore_errs:
raise

def call_when_running(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new function is here Clock.call_when_running

We're wrapping reactor.callWhenRunning so there is always a logcontext active and therefore know which server generated the logs

d.addErrback(log_failure, "Looping call died", consumeErrors=False)
return call

def call_later(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other functions like looping_call and call_later are getting their own logcontext wrappers in #18944

@MadLittleMods MadLittleMods merged commit 5a9ca1e into develop Sep 22, 2025
44 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/hs-clock-utils-logcontext branch September 22, 2025 15:28
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @reivilibre 🐩

MadLittleMods added a commit that referenced this pull request Sep 22, 2025
…ooping_call_now`, `call_later`) (#18907)

Part of #18905

Lints for ensuring we use `Clock.call_later` instead of
`reactor.callLater`, etc are coming in
#18944

### Testing strategy

 1. Configure Synapse to log at the `DEBUG` level
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Wait 10 seconds for the [database profiling
loop](https://github.com/element-hq/synapse/blob/9cc400177822805e2a08d4d934daad6f3bc2a4df/synapse/storage/database.py#L711)
to execute
1. Notice the logcontext being used for the `Total database time` log
line

Before (`sentinel`):

```
2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - sentinel - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%}
```

After (`looping_call`):

```
2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - looping_call - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants