[Metrics scraper] Metrics scraper pod overloads and crashes #8015

maciaszczykm · 2023-07-07T14:39:45Z

We have a cluster here that has 34 nodes and 2497 pod. The metrics scraper seemed to reach 5000m of cpu and 6.7G of memory before eventually crashing.

Dashboard version v2.0.5
Metric scraper version v1.0.6
The metrics scraper produces roughly 500000 log lines per hour and look like this

Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes//metrics/cpu/usage_rate HTTP/1.1" 200 874 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes//metrics/cpu/usage_rate HTTP/1.1" 200 875 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes//metrics/cpu/usage_rate HTTP/1.1" 200 878 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes//metrics/cpu/usage_rate HTTP/1.1" 200 888 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes//metrics/cpu/usage_rate HTTP/1.1" 200 892 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes//metrics/cpu/usage_rate HTTP/1.1" 200 891 "" "dashboard/v2.0.5"
It seems like it's handling the requests like it should, it's just getting overloaded and can't handle that very well. I think adding a cpu and memory limit wouldn't help very much because I think that also causes the pod to keep crashing once it hits it.
This is about as much info as I have about it on this cluster. The user did delete the pod and it came back and overloaded and crashed again.

Opened by @Joseph-Goergen.

See kubernetes-sigs/dashboard-metrics-scraper#38 for more details.

k8s-triage-robot · 2024-01-24T00:56:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-23T01:04:05Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-24T01:41:56Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-24T01:41:59Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

maciaszczykm added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 7, 2023

maciaszczykm changed the title ~~Metrics scraper pod overloads and crashes~~ [Metrics scraper] Metrics scraper pod overloads and crashes Jul 7, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 23, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics scraper] Metrics scraper pod overloads and crashes #8015

[Metrics scraper] Metrics scraper pod overloads and crashes #8015

maciaszczykm commented Jul 7, 2023 •

edited

Loading

k8s-triage-robot commented Jan 24, 2024

k8s-triage-robot commented Feb 23, 2024

k8s-triage-robot commented Mar 24, 2024

k8s-ci-robot commented Mar 24, 2024

[Metrics scraper] Metrics scraper pod overloads and crashes #8015

[Metrics scraper] Metrics scraper pod overloads and crashes #8015

Comments

maciaszczykm commented Jul 7, 2023 • edited Loading

k8s-triage-robot commented Jan 24, 2024

k8s-triage-robot commented Feb 23, 2024

k8s-triage-robot commented Mar 24, 2024

k8s-ci-robot commented Mar 24, 2024

maciaszczykm commented Jul 7, 2023 •

edited

Loading