Skip to content

Replace Valkey-based Live Stats with Prometheus-backed Implementation #8252

@seedspirit

Description

@seedspirit

Background

The current live statistics system uses a custom implementation where:

  • Agents collect metrics via StatContext and store them in Valkey (Redis) with
    short TTLs (8-120 seconds)
  • Manager serves these stats through GraphQL live_stat fields (kernel.live_stat,
    agent.live_stat)
  • Data is serialized with msgpack and queried via ValkeyStatClient

Meanwhile, a separate Prometheus integration already exists(ContainerMetricService) that queries metrics from Prometheus but is not used for the live_stat API.

Problem

  • Duplicate data paths: Metrics are sent to both Valkey (for live stats) and
    Prometheus (for monitoring/dashboards)
  • No historical data: Valkey-based stats are ephemeral with short TTLs, making
    trend analysis impossible
  • Maintenance burden: Two separate metric pipelines to maintain
  • Inconsistency risk: Valkey stats and Prometheus metrics may diverge

Proposed Solution

Replace the Valkey-based live stats system with a Prometheus-backed implementation:

  1. Agent side: Remove Valkey stat publishing; ensure metrics are exported to
    Prometheus
  2. Manager side: Modify live_stat GraphQL resolvers to query Prometheus via
    ContainerMetricService instead of ValkeyStatClient
  3. Deprecate: Phase out ValkeyStatClient for statistics (keep ValkeyLiveClient for
    service discovery)

Benefits

  • Single source of truth: All metrics flow through Prometheus
  • Historical queries: Access to time-series data for trends and analysis
  • Ecosystem integration: Native compatibility with Grafana, alerting, etc.
  • Reduced complexity: Eliminate Valkey stat storage and TTL management
  • Lower Valkey load: Remove high-frequency stat read/write operations

JIRA Issue: BA-4039

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions