Skip to content

Conversation

@rfratto
Copy link
Member

@rfratto rfratto commented Nov 18, 2025

Note

I had to refactor workerConn a little for this; I isolated those changes into the first commit for easier review.

This PR adds a base set of scheduler metrics for monitoring:

  • loki_engine_scheduler_tasks_total: Counter for task transitions (by state)
  • loki_engine_scheduler_tasks_inflight: Gauge for current tasks (by state)
  • loki_engine_scheduler_streams_total: Counter for stream transitions (by state)
  • loki_engine_scheduler_streams_inflight: Gaguge for current streams (by state)
  • loki_engine_scheduler_workers: Gauge for current workers (by state)
  • loki_engine_scheduler_threads: Gauge for current threads (by state)
  • loki_engine_scheduler_connections_total: Counter for total incoming conns
  • loki_engine_scheduler_connections_active: Gauge for current active conns
  • loki_engine_scheduler_task_queue_seconds: Tracks how long tasks are enqueued
  • loki_engine_scheduler_task_exec_seconds: Tracks how long tasks take to finish

Worker and thread state is a new concept, introduced for monitoring. They are in one of the following states:

  • busy: A task is executing
  • ready: Tasks have been requested
  • idle: No tasks have been requested or are running

The overall worker state is inferred from the state of its threads, in precedence order of busy > ready > idle.

Observing the scheduler will require accessing workerConn concurrently
from the /metrics endpoint and the connection handler. To prepare for
this, all state handling for workerConn has been moved into methods, and
all fields are now protected by a mutex.

Signed-off-by: Robert Fratto <[email protected]>
Add a base set of scheduler metrics for monitoring:

* loki_scheduler_tasks_total: Counter for task transitions (by state)
* loki_scheduler_tasks_inflight: Gauge for current tasks (by state)
* loki_scheduler_streams_total: Counter for stream transitions (by
  state)
* loki_scheduler_streams_inflight: Gaguge for current streams (by state)
* loki_scheudler_workers: Gauge for current workers (by state)
* loki_scheduler_threads: Gauge for current threads (by state)
* loki_scheduler_connections_total: Counter for total incoming conns
* loki_scheduler_connections_active: Gauge for current active conns
* loki_scheduler_task_queue_seconds: Tracks how long tasks are enqueued
* loki_scheduler_task_exec_seconds: Tracks how long tasks take to finish

Worker and thread state is a new concept, introduced for monitoring.
They are in one of the following states:

* busy: A task is executing
* ready: Tasks have been requested
* idle: No tasks have been requested or are running

The overall worker state is inferred from the state of its threads, in
precedence order of busy > ready > idle.

Signed-off-by: Robert Fratto <[email protected]>
@rfratto rfratto requested a review from a team as a code owner November 18, 2025 15:38
Copy link
Contributor

@ashwanthgoli ashwanthgoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

sched: sched,

tasksInflight: prometheus.NewDesc(
"loki_scheduler_tasks_inflight",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be confused with the old scheduler, maybe loki_engine_scheduler_*

Changes the prefix as suggested to avoid confusion with the
loki_query_scheduler_ prefix.

Signed-off-by: Robert Fratto <[email protected]>
@rfratto rfratto merged commit 2fce7da into main Nov 19, 2025
69 checks passed
@rfratto rfratto deleted the thor-scheduler-metrics branch November 19, 2025 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants