Skip to content

Support node replicas in /api/v2/monitor/health response #62518

@zach-overflow

Description

@zach-overflow

Description

Modify the /api/v2/monitor/health response to surface information for the state of any horizontally scaled / replicated components. Given that all the components described in the current response are horizontally scalable, It would be much more useful to return a list of state payloads for each component. For example, in a high-availability scheduler setup running 2 schedulers, it would be useful to see the status information for both those two schedulers in the response.

A very rough outline of what the proposed API response might look like is below:

{
  "metadatabase": {
    "status": "string"
  },
  "scheduler": [
    {
      "hostname": "scheduler-1",
      "status": "healthy",
      "latest_scheduler_heartbeat": "scheduler-1 latest heartbeat here"
    },
    {
      "hostname": "scheduler-2",
      "status": "healthy",
      "latest_scheduler_heartbeat": "scheduler-2 latest heartbeat here"
    }
  ],
  "triggerer": [
    ...
  ],
  "dag_processor": [
    ...
  ]
}

Concretely, I believe the most noteworthy changes necessary for this would be:

  1. Adding a hostname field to the (Scheduler|Triggerer|DagProcessor)InfoResponse models in airflow.api_fastapi.core_api.datamodels.monitor
  2. Changing the fields of HealthInfoResponse model to be a list of the corresponding per-field models.
  3. Adding an index on the hostname column in the jobs table (this particular change would benefit a number of other things unrelated to this ticket).

Use case/motivation

Currently, as of version 3.1.7, responses from the /api/v2/monitor/health endpoint offer a single status and latest_*_heartbeat field per Airflow component. This is fine for any Airflow deployment which has a single scheduler, a single triggerer, and so on. However, the information from this endpoint does not provide reliable insight into the system's health if one or more of those components are horizontally scaled.

Related issues

Tangentially, the feature proposed here could make it easier to implement #17191, but to be clear that ticket describes a different feature.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:APIAirflow's REST/HTTP APIkind:featureFeature Requestsneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions