Skip to content

[flyte2] Instrument the runs service DB repository layer with Prometheus metrics #7448

@pingsutw

Description

@pingsutw

Part of #7445. Depends on #7446 (the /metrics endpoint + Scope must exist first).

Summary

Add Prometheus metrics to the runs service database repository layer so we can observe DB call volume, error rates, and latency per operation.

Background

The repository implementations live in runs/repository/impl/ (e.g. task.go, trigger.go, and the run repo). These wrap GORM/DB calls and currently emit no metrics — there is no visibility into how often each query runs, how long it takes, or how often it fails.

What to do

  1. Thread the metrics Scope (from [flyte2] Add /metrics endpoint and initialize metrics Scope in the app framework #7446, available via app.SetupContext.Scope) into the repository constructor(s). runs/setup.go calls repository.NewRepository(sc.DB, cfg.Database) and impl.NewProjectRepo(sc.DB) — extend these (or wrap the repo) to accept a promutils.Scope.

  2. For each DB operation (create/get/list/update/delete), record:

    • call count (labeled by operation name)
    • error count (labeled by operation name)
    • latency (a Prometheus timer / stopwatch)

    A small helper that wraps a DB call with start := time.Now(); defer timer.Stop() + counter increments keeps this DRY.

Acceptance criteria

  • /metrics exposes per-operation DB call count, error count, and latency for the runs repository.
  • Metrics are created once (no duplicate-registration panics) using a dedicated sub-scope, e.g. scope.NewSubScope("db").
  • Unit tests assert that a repository operation increments the expected counter / records latency.

Pointers

  • runs/repository/impl/ — repository implementations to instrument.
  • runs/repository/repository.go (the NewRepository constructor) and runs/setup.go:40 where it's called.
  • flytestdlib/promutils/scope.goScope helpers (MustNewCounter, MustNewStopWatch, NewSubScope).

Notes for contributors

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions