Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to redfish exporter v2 for Dell systems #1549

Open
wants to merge 12 commits into
base: redfish-exporter-2.0
Choose a base branch
from

Conversation

jovial
Copy link
Contributor

@jovial jovial commented Feb 28, 2025

No description provided.

- Compatability with Dell servers
- Added health round up panels
- Add variables for different groups e.g overcloud vs compute
Prometheus will regard metrics collected over this perioid as stale as
such they won't show up in grafana.
@jovial jovial requested a review from a team as a code owner February 28, 2025 18:15
@product-auto-label product-auto-label bot added size: l monitoring All things related to observability & telemetry labels Feb 28, 2025
@@ -26,7 +26,7 @@ kolla_image_tags:
# Monitoring and alerting related settings

opensearch_heap_size: 8g
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d"
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d --query.lookback-delta=15m"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could something fancier like twice the longest scrape interval.

@jovial jovial marked this pull request as draft February 28, 2025 18:16
jovial and others added 3 commits February 28, 2025 18:25
The redfish dashboard was updated to use a new UUID, however this caused
errors in the Grafana logs and Grafana didn't update the dashboard.
Reverting this to the old UUID fixes the issue.
@m-bull m-bull force-pushed the bugfix/edfish-exporter-2.0/dell branch from cc2c5e5 to 15e2cba Compare March 7, 2025 15:58
###############################################################################
# Prometheus server configuration

stackhpc_prometheus_query_lookback_delta: "{{ [redfish_exporter_scrape_interval, stackhpc_os_capacity_scrape_interval, 300] | max }}s"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add a few seconds here to avoid some of the edge effects you saw in testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also wonder if we need to change the openstack exporter threshold to be included in here as well. For larger clouds, its probably a bit too aggressive by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added 30 seconds to give some leeway. Have also added an openstack exporter scrape interval variable and added it the max calculation.

@@ -26,7 +26,7 @@ kolla_image_tags:
# Monitoring and alerting related settings

opensearch_heap_size: 8g
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d"
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d --query.lookback-delta={{ stackhpc_prometheus_query_lookback_delta }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I am +1 tweaking this, given we regularly increase the interval for various exporters.

@jovial jovial requested a review from JohnGarbutt March 10, 2025 17:53
@jovial jovial marked this pull request as ready for review March 10, 2025 17:53
Comment on lines +11 to +14
- |
Sets the prometheus server side option ``query.lookback-delta`` to
the largest scrape interval so that metrics are not from exporters
with large scrape intervals are not marked stale before the next scrape.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- |
Sets the prometheus server side option ``query.lookback-delta`` to
the largest scrape interval so that metrics are not from exporters
with large scrape intervals are not marked stale before the next scrape.
- |
Sets the prometheus server side option ``query.lookback-delta`` to
the largest scrape interval so that metrics from exporters with large
scrape intervals are not marked stale before the next scrape.

# Whether the redfish exporter is enabled.
stackhpc_enable_redfish_exporter: false

# How often to scrape the BMCs in seconds.
stackhpc_redfish_exporter_scrape_interval: "{{ [8 * groups['redfish_exporter_targets'] | length, 300] | max }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this get applied?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
monitoring All things related to observability & telemetry size: xl
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants