Skip to content

Add whitelist for k8s managed container for health_checker#25730

Open
FengPan-Frank wants to merge 6 commits intosonic-net:masterfrom
FengPan-Frank:health_checker_k8s
Open

Add whitelist for k8s managed container for health_checker#25730
FengPan-Frank wants to merge 6 commits intosonic-net:masterfrom
FengPan-Frank:health_checker_k8s

Conversation

@FengPan-Frank
Copy link
Contributor

Why I did it

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Copilot AI review requested due to automatic review settings February 27, 2026 10:02
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a whitelist mechanism to exclude certain Kubernetes-managed containers (telemetry, acms, restapi) from health checking in the system health checker. These containers are excluded from both expected and current running container sets, preventing health check failures when Kubernetes manages them independently.

Changes:

  • Introduces CONTAINER_K8S_WHITELIST constant containing telemetry, acms, and restapi
  • Modifies get_expected_running_containers() and get_current_running_containers() to skip whitelisted containers
  • Updates existing tests and adds comprehensive test coverage for the whitelist functionality

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/system-health/health_checker/service_checker.py Adds CONTAINER_K8S_WHITELIST and implements filtering logic in get_expected_running_containers() and get_current_running_containers()
src/system-health/tests/test_system_health.py Updates test_service_checker_k8s_containers, test_service_checker_mixed_containers, and adds new test_service_checker_k8s_whitelist test

container_list = []
for container_name in feature_table.keys():
# Skip containers in the whitelist
if container_name in ServiceChecker.CONTAINER_K8S_WHITELIST:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will skip below if container_name == "telemetry" branch completely. Is it intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes intentional for skipping all kubesonic managed container.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@make1980 make1980 self-requested a review February 27, 2026 22:18
dtype = labels.get("io.kubernetes.docker.type")
kname = labels.get("io.kubernetes.container.name")

if ns == "sonic":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as ns is not empty we can skip the container

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

qiluo-msft
qiluo-msft previously approved these changes Feb 27, 2026
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +518 to +521
def test_service_checker_k8s_whitelist(mock_config_db, mock_run, mock_docker_client):
"""Test that containers in CONTAINER_K8S_WHITELIST are excluded from both
expected running containers and current running containers.
"""
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test asserts that whitelisted containers are excluded even when they are regular Docker containers (no K8s labels). That seems at odds with the PR goal/title of handling “k8s managed” containers, and it would mask regressions where classic Docker deployments stop monitoring telemetry/restapi. Consider updating the test scenario to model K8s-managed containers (set io.kubernetes.* labels) and add/keep an assertion that non-K8s containers with these names are still monitored when K8s is not in use.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +56
# Whitelist of containers which are managed by KubeSonic to bypass health checking entirely.
# These containers will be excluded from both expected and running container sets.
CONTAINER_K8S_WHITELIST = {'telemetry', 'acms', 'restapi'}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTAINER_K8S_WHITELIST is applied unconditionally, which will bypass health checking for telemetry/restapi even on non-Kubernetes SONiC images where these are standard Docker containers (e.g., rules/docker-telemetry.mk and rules/docker-restapi.mk define container names telemetry and restapi). This weakens system-health coverage by design outside KubeSonic. Consider gating this whitelist on a positive KubeSonic/K8s signal (config knob, platform flag, or presence of io.kubernetes.* labels) so classic Docker deployments still monitor these containers.

Copilot uses AI. Check for mistakes.
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants