-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to redfish exporter v2 for Dell systems #1549
base: redfish-exporter-2.0
Are you sure you want to change the base?
Conversation
- Compatability with Dell servers - Added health round up panels - Add variables for different groups e.g overcloud vs compute
Prometheus will regard metrics collected over this perioid as stale as such they won't show up in grafana.
etc/kayobe/kolla/globals.yml
Outdated
@@ -26,7 +26,7 @@ kolla_image_tags: | |||
# Monitoring and alerting related settings | |||
|
|||
opensearch_heap_size: 8g | |||
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d" | |||
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d --query.lookback-delta=15m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could something fancier like twice the longest scrape interval.
The redfish dashboard was updated to use a new UUID, however this caused errors in the Grafana logs and Grafana didn't update the dashboard. Reverting this to the old UUID fixes the issue.
cc2c5e5
to
15e2cba
Compare
etc/kayobe/stackhpc-monitoring.yml
Outdated
############################################################################### | ||
# Prometheus server configuration | ||
|
||
stackhpc_prometheus_query_lookback_delta: "{{ [redfish_exporter_scrape_interval, stackhpc_os_capacity_scrape_interval, 300] | max }}s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to add a few seconds here to avoid some of the edge effects you saw in testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also wonder if we need to change the openstack exporter threshold to be included in here as well. For larger clouds, its probably a bit too aggressive by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added 30 seconds to give some leeway. Have also added an openstack exporter scrape interval variable and added it the max calculation.
@@ -26,7 +26,7 @@ kolla_image_tags: | |||
# Monitoring and alerting related settings | |||
|
|||
opensearch_heap_size: 8g | |||
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d" | |||
prometheus_cmdline_extras: "--storage.tsdb.retention.time=30d --query.lookback-delta={{ stackhpc_prometheus_query_lookback_delta }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I am +1 tweaking this, given we regularly increase the interval for various exporters.
- | | ||
Sets the prometheus server side option ``query.lookback-delta`` to | ||
the largest scrape interval so that metrics are not from exporters | ||
with large scrape intervals are not marked stale before the next scrape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- | | |
Sets the prometheus server side option ``query.lookback-delta`` to | |
the largest scrape interval so that metrics are not from exporters | |
with large scrape intervals are not marked stale before the next scrape. | |
- | | |
Sets the prometheus server side option ``query.lookback-delta`` to | |
the largest scrape interval so that metrics from exporters with large | |
scrape intervals are not marked stale before the next scrape. |
# Whether the redfish exporter is enabled. | ||
stackhpc_enable_redfish_exporter: false | ||
|
||
# How often to scrape the BMCs in seconds. | ||
stackhpc_redfish_exporter_scrape_interval: "{{ [8 * groups['redfish_exporter_targets'] | length, 300] | max }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this get applied?
No description provided.