Improvements to redfish exporter v2 for Dell systems #1549

jovial · 2025-02-28T18:15:01Z

No description provided.

- Compatability with Dell servers - Added health round up panels - Add variables for different groups e.g overcloud vs compute

Prometheus will regard metrics collected over this perioid as stale as such they won't show up in grafana.

jovial · 2025-02-28T18:15:53Z

etc/kayobe/kolla/globals.yml

@@ -26,7 +26,7 @@ kolla_image_tags:
 # Monitoring and alerting related settings

 opensearch_heap_size: 8g
-prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d"
+prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d --query.lookback-delta=15m"


We could something fancier like twice the longest scrape interval.

The redfish dashboard was updated to use a new UUID, however this caused errors in the Grafana logs and Grafana didn't update the dashboard. Reverting this to the old UUID fixes the issue.

JohnGarbutt · 2025-03-10T16:18:59Z

etc/kayobe/stackhpc-monitoring.yml

+###############################################################################
+# Prometheus server configuration
+
+stackhpc_prometheus_query_lookback_delta: "{{ [redfish_exporter_scrape_interval, stackhpc_os_capacity_scrape_interval, 300] | max }}s"


Do you need to add a few seconds here to avoid some of the edge effects you saw in testing?

I am also wonder if we need to change the openstack exporter threshold to be included in here as well. For larger clouds, its probably a bit too aggressive by default.

I've added 30 seconds to give some leeway. Have also added an openstack exporter scrape interval variable and added it the max calculation.

JohnGarbutt · 2025-03-10T16:19:54Z

etc/kayobe/kolla/globals.yml

@@ -26,7 +26,7 @@ kolla_image_tags:
 # Monitoring and alerting related settings

 opensearch_heap_size: 8g
-prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d"
+prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d --query.lookback-delta={{ stackhpc_prometheus_query_lookback_delta }}"


FWIW, I am +1 tweaking this, given we regularly increase the interval for various exporters.

releasenotes/notes/bumps-redfish-exporter-to-v2-11032fb9dde36283.yaml

Alex-Welsh · 2025-04-01T08:03:56Z

releasenotes/notes/bumps-redfish-exporter-to-v2-11032fb9dde36283.yaml

+  - |
+    Sets the prometheus server side option ``query.lookback-delta`` to
+    the largest scrape interval so that metrics are not from exporters
+    with large scrape intervals are not marked stale before the next scrape.


Suggested change

- |

Sets the prometheus server side option ``query.lookback-delta`` to

the largest scrape interval so that metrics are not from exporters

with large scrape intervals are not marked stale before the next scrape.

- |

Sets the prometheus server side option ``query.lookback-delta`` to

the largest scrape interval so that metrics from exporters with large

scrape intervals are not marked stale before the next scrape.

Alex-Welsh · 2025-04-01T08:05:56Z

etc/kayobe/stackhpc-monitoring.yml

 # Whether the redfish exporter is enabled.
 stackhpc_enable_redfish_exporter: false

+# How often to scrape the BMCs in seconds.
+stackhpc_redfish_exporter_scrape_interval: "{{ [8 * groups['redfish_exporter_targets'] | length, 300] | max }}"


Where does this get applied?

jovial added 2 commits February 28, 2025 18:09

Update redfish dashboard

0459eab

- Compatability with Dell servers - Added health round up panels - Add variables for different groups e.g overcloud vs compute

Increase query lookback delta

7fcabe2

Prometheus will regard metrics collected over this perioid as stale as such they won't show up in grafana.

jovial requested a review from a team as a code owner February 28, 2025 18:15

product-auto-label bot added size: l monitoring All things related to observability & telemetry labels Feb 28, 2025

jovial commented Feb 28, 2025

View reviewed changes

jovial marked this pull request as draft February 28, 2025 18:16

jovial and others added 3 commits February 28, 2025 18:25

Add raw tags

d4b2607

Re-tweak Redfish dashboard for Lenovo

b86b28d

Restore old UUID of the redfish dashboard

15e2cba

The redfish dashboard was updated to use a new UUID, however this caused errors in the Grafana logs and Grafana didn't update the dashboard. Reverting this to the old UUID fixes the issue.

m-bull force-pushed the bugfix/edfish-exporter-2.0/dell branch from cc2c5e5 to 15e2cba Compare March 7, 2025 15:58

jovial added 2 commits March 10, 2025 16:04

Make query lookback delta setting smarter

a36b667

Actually set lookback delta

68b8ea1

JohnGarbutt reviewed Mar 10, 2025

View reviewed changes

Adds openstack exporter scrape interval

cf085b9

jovial requested a review from JohnGarbutt March 10, 2025 17:53

jovial marked this pull request as ready for review March 10, 2025 17:53

jovial added 4 commits March 11, 2025 17:17

Add some raw tags

27dfdc4

Slightly less offensive variant on raw tags

8f3ac48

Refresh release note

a73dd62

Update dashboard

ba7934a

product-auto-label bot added size: xl and removed size: l labels Mar 12, 2025

Alex-Welsh reviewed Apr 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to redfish exporter v2 for Dell systems #1549

Improvements to redfish exporter v2 for Dell systems #1549

jovial commented Feb 28, 2025

jovial Feb 28, 2025

JohnGarbutt Mar 10, 2025

JohnGarbutt Mar 10, 2025

jovial Mar 10, 2025

JohnGarbutt Mar 10, 2025

Alex-Welsh Apr 1, 2025

Alex-Welsh Apr 1, 2025

Improvements to redfish exporter v2 for Dell systems #1549

Are you sure you want to change the base?

Improvements to redfish exporter v2 for Dell systems #1549

Conversation

jovial commented Feb 28, 2025

jovial Feb 28, 2025

Choose a reason for hiding this comment

JohnGarbutt Mar 10, 2025

Choose a reason for hiding this comment

JohnGarbutt Mar 10, 2025

Choose a reason for hiding this comment

jovial Mar 10, 2025

Choose a reason for hiding this comment

JohnGarbutt Mar 10, 2025

Choose a reason for hiding this comment

Alex-Welsh Apr 1, 2025

Choose a reason for hiding this comment

Alex-Welsh Apr 1, 2025

Choose a reason for hiding this comment