Configuration for the StackHPC fork of Redfish Exporter by m-bull · Pull Request #1530 · stackhpc/stackhpc-kayobe-config

m-bull · 2025-02-22T09:37:25Z

Use updated container image
Update scrape jobs to not collect logs during frequent scrapes, then collect logs once per hour
Clean up Redfish dashboard to work with Lenovo hardware and remove deprecated panel types

Dashboard needs testing for compatibility with metrics produced by other manufacturer's Redfish implementations.

dougszumski

Very nice, many thanks for adding this.

dougszumski · 2025-02-28T11:10:50Z

etc/kayobe/kolla/config/prometheus/prometheus.yml.d/60-redfish.yml

+          env: "{{ kayobe_environment | default('openstack') }}"
+          group: "{{ hostvars[host]['redfish_exporter_scrape_group'] | default('overcloud') }}"
+{% endfor %}
+  - job_name: redfish-exporter-collectlog


I wondered if we should put this behind a redfish_exporter_collect_logs flag so we can easily disable it at sites if it causes issues. Having said that, it should be a lot more robust now it lives in a separate scrape job. Many thanks for adding it.

I think its more nuanced than that, and I couldn't quite get my brain around it when I made this PR, but I think its a bit clearer to me now...

There's two cases of scrape style (currently anyway!):

[iDRAC style] Scrape normally in a single job with collectlog not present in the job, just use the defaults - this is what we've always done and should be the default IMO

[Lenovo XCC style] Two jobs, one with collectlog=true and the other more frequent with collectlog=false

I think we should put the second style behind a feature flag as you suggest.

That sounds good. A limp mode flag (2) for cases when the logs are taking too long to fetch, and 1 as the default. On SMSlab (iDRAC) I noticed that most of the logs fetched are actually logs from logging into and out of the BMC - so I am hoping that once we switch to using persistent sessions, the scrape time will improve. I see about 5 minutes for an iDRAC there, which easily causes trouble.

jovial · 2025-02-28T16:00:20Z

For me a bunch of stuff doesn't work with dell. I will try and fix up a few bits. We also have lost the health summary. Did that not work on lenovo? That was one of more useful bits for me.

m-bull · 2025-02-28T16:10:43Z

This is just up as a record of things that worked on Lenovo, I don't have the systems to be able to coalesce the dashboards to work on both types of hardware unfortunately :(. Added to metrics and panel names not really matching up, I didn't make any real attempt to remain compatible with the Dell metrics.

I also had to remove some bits of the dashboard because of the Angular deprecation, though I don't remember if the health summary was one of those.

product-auto-label bot added size: xl monitoring All things related to observability & telemetry labels Feb 22, 2025

m-bull requested review from GregWhiteyBialas and dougszumski February 22, 2025 10:01

Configuration for the StackHPC fork of Redfish Exporter

690623c

m-bull force-pushed the redfish-exporter-2.0 branch from 79f8936 to 690623c Compare February 22, 2025 10:11

dougszumski approved these changes Feb 28, 2025

View reviewed changes

jovial mentioned this pull request Feb 28, 2025

Fix "Max Inlet Temp" time series chart #1544

Merged

m-bull closed this Apr 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration for the StackHPC fork of Redfish Exporter#1530

Configuration for the StackHPC fork of Redfish Exporter#1530
m-bull wants to merge 1 commit intostackhpc/2024.1from
redfish-exporter-2.0

m-bull commented Feb 22, 2025

Uh oh!

dougszumski left a comment

Uh oh!

dougszumski Feb 28, 2025

Uh oh!

m-bull Feb 28, 2025 •

edited

Loading

Uh oh!

dougszumski Feb 28, 2025

Uh oh!

jovial commented Feb 28, 2025

Uh oh!

m-bull commented Feb 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

m-bull commented Feb 22, 2025

Uh oh!

dougszumski left a comment

Choose a reason for hiding this comment

Uh oh!

dougszumski Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

m-bull Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dougszumski Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jovial commented Feb 28, 2025

Uh oh!

m-bull commented Feb 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

m-bull Feb 28, 2025 •

edited

Loading