Add whitelist for k8s managed container for health_checker#25730
Add whitelist for k8s managed container for health_checker#25730FengPan-Frank wants to merge 6 commits intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This pull request adds a whitelist mechanism to exclude certain Kubernetes-managed containers (telemetry, acms, restapi) from health checking in the system health checker. These containers are excluded from both expected and current running container sets, preventing health check failures when Kubernetes manages them independently.
Changes:
- Introduces
CONTAINER_K8S_WHITELISTconstant containing telemetry, acms, and restapi - Modifies
get_expected_running_containers()andget_current_running_containers()to skip whitelisted containers - Updates existing tests and adds comprehensive test coverage for the whitelist functionality
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/system-health/health_checker/service_checker.py | Adds CONTAINER_K8S_WHITELIST and implements filtering logic in get_expected_running_containers() and get_current_running_containers() |
| src/system-health/tests/test_system_health.py | Updates test_service_checker_k8s_containers, test_service_checker_mixed_containers, and adds new test_service_checker_k8s_whitelist test |
| container_list = [] | ||
| for container_name in feature_table.keys(): | ||
| # Skip containers in the whitelist | ||
| if container_name in ServiceChecker.CONTAINER_K8S_WHITELIST: |
There was a problem hiding this comment.
it will skip below if container_name == "telemetry" branch completely. Is it intentional?
There was a problem hiding this comment.
yes intentional for skipping all kubesonic managed container.
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| dtype = labels.get("io.kubernetes.docker.type") | ||
| kname = labels.get("io.kubernetes.container.name") | ||
|
|
||
| if ns == "sonic": |
There was a problem hiding this comment.
I think as long as ns is not empty we can skip the container
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…/sonic-buildimage into health_checker_k8s
| def test_service_checker_k8s_whitelist(mock_config_db, mock_run, mock_docker_client): | ||
| """Test that containers in CONTAINER_K8S_WHITELIST are excluded from both | ||
| expected running containers and current running containers. | ||
| """ |
There was a problem hiding this comment.
This test asserts that whitelisted containers are excluded even when they are regular Docker containers (no K8s labels). That seems at odds with the PR goal/title of handling “k8s managed” containers, and it would mask regressions where classic Docker deployments stop monitoring telemetry/restapi. Consider updating the test scenario to model K8s-managed containers (set io.kubernetes.* labels) and add/keep an assertion that non-K8s containers with these names are still monitored when K8s is not in use.
| # Whitelist of containers which are managed by KubeSonic to bypass health checking entirely. | ||
| # These containers will be excluded from both expected and running container sets. | ||
| CONTAINER_K8S_WHITELIST = {'telemetry', 'acms', 'restapi'} |
There was a problem hiding this comment.
CONTAINER_K8S_WHITELIST is applied unconditionally, which will bypass health checking for telemetry/restapi even on non-Kubernetes SONiC images where these are standard Docker containers (e.g., rules/docker-telemetry.mk and rules/docker-restapi.mk define container names telemetry and restapi). This weakens system-health coverage by design outside KubeSonic. Consider gating this whitelist on a positive KubeSonic/K8s signal (config knob, platform flag, or presence of io.kubernetes.* labels) so classic Docker deployments still monitor these containers.
…/sonic-buildimage into health_checker_k8s
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Why I did it
Work item tracking
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)