Skip to content

[flyte2] Add gRPC/Connect RPC metrics interceptor to the runs service #7447

@pingsutw

Description

@pingsutw

Part of #7445. Depends on #7446 (the /metrics endpoint + Scope must exist first).

Summary

Add RPC-level Prometheus metrics (request count, error count, latency) to the runs service by attaching a shared Connect interceptor to every service handler.

Background

The runs service is a Connect (connectrpc.com/connect) server. Handlers are mounted in runs/setup.go via calls like:

runsPath, runsHandler := workflowconnect.NewRunServiceHandler(runsSvc)
sc.Mux.Handle(runsPath, runsHandler)

There are currently no interceptors anywhere in the v2 tree, so no RPC metrics are emitted.

What to do

  1. Write a Connect interceptor (a connect.UnaryInterceptorFunc / connect.Interceptor) that records, per RPC procedure:

    • request count (e.g. requests_total labeled by procedure)
    • error count (labeled by procedure, and ideally connect.CodeOf(err))
    • latency (a Prometheus histogram / Scope.MustNewStopWatch style timer)

    Use the sc.Scope provided by [flyte2] Add /metrics endpoint and initialize metrics Scope in the app framework #7446 to create the metrics (e.g. a sub-scope sc.Scope.NewSubScope("grpc")).

  2. Pass the interceptor to every New*ServiceHandler(...) call in runs/setup.go via connect.WithInterceptors(...), e.g.:

    interceptors := connect.WithInterceptors(metricsInterceptor)
    runsPath, runsHandler := workflowconnect.NewRunServiceHandler(runsSvc, interceptors)

    Apply it to RunService, InternalRunService, TaskService, IdentityService, AuthMetadataService, TriggerService, ProjectService (and RunLogsService when mounted).

Acceptance criteria

  • After making RPC calls, /metrics exposes per-procedure request count, error count, and latency metrics.
  • The interceptor is shared/created once and reused across all handlers.
  • A unit test verifies the interceptor increments the request counter (and error counter on error) for a sample procedure.

Pointers

  • runs/setup.go — all the sc.Mux.Handle(...) registrations (lines ~78-120+).
  • Connect interceptor docs: https://connectrpc.com/docs/go/interceptors/
  • flytestdlib/promutils/scope.goScope helpers (MustNewCounter, MustNewStopWatch, NewSubScope, etc.).

Notes for contributors

  • Keep label cardinality bounded — label by procedure name and status code, not by arbitrary user input.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions