Part of #7445. Depends on #7446 (the /metrics endpoint + initialized Scope must exist first).
Summary
Instrument the actions service with Prometheus metrics: implement the existing dropped-updates counter TODO, and add throughput / latency / queue-depth metrics for the TaskAction watcher.
Background
The actions service is already partly wired for metrics — it just has nothing to plug into yet:
actions/setup.go:39 already passes sc.Scope into NewActionsClient(...).
actions/k8s/client.go:91 already uses scope.NewSubScope("actions_filter") for the dedup bloom filter.
actions/k8s/client.go:65 has an explicit TODO: // TODO: add a prometheus counter for dropped updates when metrics are wired up.
Note on the metrics scope: When run via the unified manager (manager/cmd/main.go:75), sc.Scope is already initialized (promutils.NewScope("flyte")) before actions.Setup runs, so the bloom-filter sub-scope at client.go:91 works and there is no panic. The dependency on #7446 is because #7446 mounts the /metrics endpoint — without it, the metrics you add here are registered into the default registry but never exposed to a scrape. (#7446 also initializes sc.Scope at the framework level, which additionally makes the standalone actions/cmd/main.go binary safe — that path currently leaves sc.Scope nil, so client.go:90-91's scope.NewSubScope(...) would panic there, since RecordFilterSize defaults to 1 << 23 > 0.)
What to do
Using the Scope available on ActionsClient (passed in via NewActionsClient), add metrics under a dedicated sub-scope (e.g. scope.NewSubScope("watcher")):
- Dropped updates counter — implement the TODO at
actions/k8s/client.go:65. Increment a counter whenever a watch update is dropped (e.g. buffer full / channel send would block).
- Watcher throughput — counter of TaskAction events processed, labeled by result (success/error).
- Processing latency — a timer/histogram around per-event handling in the watch worker loop.
- Queue/buffer depth — a gauge for the watch buffer occupancy (config
WatchBufferSize), updated as events are enqueued/dequeued (or sampled periodically).
Acceptance criteria
Pointers
actions/k8s/client.go — the watcher, worker loop, buffer, and the dropped-updates TODO (line 65); constructor NewActionsClient (line 77) already receives a promutils.Scope.
actions/setup.go:31-40 — where NewActionsClient is constructed with sc.Scope.
flytestdlib/promutils/scope.go — Scope helpers (MustNewCounter, MustNewGauge, MustNewStopWatch, NewSubScope).
Notes for contributors
Summary
Instrument the actions service with Prometheus metrics: implement the existing dropped-updates counter TODO, and add throughput / latency / queue-depth metrics for the TaskAction watcher.
Background
The actions service is already partly wired for metrics — it just has nothing to plug into yet:
actions/setup.go:39already passessc.ScopeintoNewActionsClient(...).actions/k8s/client.go:91already usesscope.NewSubScope("actions_filter")for the dedup bloom filter.actions/k8s/client.go:65has an explicit TODO:// TODO: add a prometheus counter for dropped updates when metrics are wired up.Note on the metrics scope: When run via the unified manager (
manager/cmd/main.go:75),sc.Scopeis already initialized (promutils.NewScope("flyte")) beforeactions.Setupruns, so the bloom-filter sub-scope atclient.go:91works and there is no panic. The dependency on #7446 is because #7446 mounts the/metricsendpoint — without it, the metrics you add here are registered into the default registry but never exposed to a scrape. (#7446 also initializessc.Scopeat the framework level, which additionally makes the standaloneactions/cmd/main.gobinary safe — that path currently leavessc.Scopenil, soclient.go:90-91'sscope.NewSubScope(...)would panic there, sinceRecordFilterSizedefaults to1 << 23 > 0.)What to do
Using the
Scopeavailable onActionsClient(passed in viaNewActionsClient), add metrics under a dedicated sub-scope (e.g.scope.NewSubScope("watcher")):actions/k8s/client.go:65. Increment a counter whenever a watch update is dropped (e.g. buffer full / channel send would block).WatchBufferSize), updated as events are enqueued/dequeued (or sampled periodically).Acceptance criteria
/metricsexposes a dropped-updates counter, watcher event throughput (by result), processing latency, and buffer depth for the actions service.actions/k8s/client.go:65is implemented and removed.Pointers
actions/k8s/client.go— the watcher, worker loop, buffer, and the dropped-updates TODO (line 65); constructorNewActionsClient(line 77) already receives apromutils.Scope.actions/setup.go:31-40— whereNewActionsClientis constructed withsc.Scope.flytestdlib/promutils/scope.go—Scopehelpers (MustNewCounter,MustNewGauge,MustNewStopWatch,NewSubScope).Notes for contributors
Scopefrom [flyte2] Add /metrics endpoint and initialize metrics Scope in the app framework #7446.