Configuration for the StackHPC fork of Redfish Exporter#1530
Configuration for the StackHPC fork of Redfish Exporter#1530m-bull wants to merge 1 commit intostackhpc/2024.1from
Conversation
79f8936 to
690623c
Compare
dougszumski
left a comment
There was a problem hiding this comment.
Very nice, many thanks for adding this.
| env: "{{ kayobe_environment | default('openstack') }}" | ||
| group: "{{ hostvars[host]['redfish_exporter_scrape_group'] | default('overcloud') }}" | ||
| {% endfor %} | ||
| - job_name: redfish-exporter-collectlog |
There was a problem hiding this comment.
I wondered if we should put this behind a redfish_exporter_collect_logs flag so we can easily disable it at sites if it causes issues. Having said that, it should be a lot more robust now it lives in a separate scrape job. Many thanks for adding it.
There was a problem hiding this comment.
I think its more nuanced than that, and I couldn't quite get my brain around it when I made this PR, but I think its a bit clearer to me now...
There's two cases of scrape style (currently anyway!):
- [iDRAC style] Scrape normally in a single job with collectlog not present in the job, just use the defaults - this is what we've always done and should be the default IMO
- [Lenovo XCC style] Two jobs, one with collectlog=true and the other more frequent with collectlog=false
I think we should put the second style behind a feature flag as you suggest.
There was a problem hiding this comment.
That sounds good. A limp mode flag (2) for cases when the logs are taking too long to fetch, and 1 as the default. On SMSlab (iDRAC) I noticed that most of the logs fetched are actually logs from logging into and out of the BMC - so I am hoping that once we switch to using persistent sessions, the scrape time will improve. I see about 5 minutes for an iDRAC there, which easily causes trouble.
|
For me a bunch of stuff doesn't work with dell. I will try and fix up a few bits. We also have lost the health summary. Did that not work on lenovo? That was one of more useful bits for me. |
|
This is just up as a record of things that worked on Lenovo, I don't have the systems to be able to coalesce the dashboards to work on both types of hardware unfortunately :(. Added to metrics and panel names not really matching up, I didn't make any real attempt to remain compatible with the Dell metrics. I also had to remove some bits of the dashboard because of the Angular deprecation, though I don't remember if the health summary was one of those. |
Dashboard needs testing for compatibility with metrics produced by other manufacturer's Redfish implementations.