Skip to content

Add hw-exporter #554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add hw-exporter #554

wants to merge 3 commits into from

Conversation

drigz
Copy link
Contributor

@drigz drigz commented Jul 16, 2025

This exports a list of connected PCI devices, so that we can track which
GPUs and NICs are installed on the fleet. node-exporter doesn't support
anything quite like this, the ethtool metric doesn't handle NICs that
aren't bound to a kernel driver for example, nor can it identify the
attached GPU.

The new binary is 9MB, and the new process has 21MB RSS, which seems to
mostly be TLS/protobuf dependencies that I guess come in via the
OpenCensus libraries, but could also come in via the PCI library, as
that has the ability to fetch a PCI ID database from the internet if
enabled at runtime (it is not enabled in this binary).

The extra metric load should be offset by
#549.

Example metrics:

# HELP pci_device_count Number of PCI devices by vendor, product, class, and driver.
# TYPE pci_device_count gauge
pci_device_count{class="Bridge",driver="",product="0x7a90",vendor="Intel Corporation"} 1
pci_device_count{class="Bridge",driver="",product="0xa700",vendor="Intel Corporation"} 1
pci_device_count{class="Bridge",driver="pcieport",product="0x7ab6",vendor="Intel Corporation"} 1
pci_device_count{class="Bridge",driver="pcieport",product="Alder Lake-S PCH PCI Express Root Port #13",vendor="Intel Corporation"} 1
pci_device_count{class="Bridge",driver="pcieport",product="Alder Lake-S PCH PCI Express Root Port #8",vendor="Intel Corporation"} 1
pci_device_count{class="Bridge",driver="pcieport",product="Raptor Lake PCI Express 5.0 Graphics Port (PEG010)",vendor="Intel Corporation"} 1
pci_device_count{class="Bridge",driver="pcieport",product="Raptor Lake PCIe 4.0 Graphics Port",vendor="Intel Corporation"} 1
pci_device_count{class="Communication controller",driver="",product="Alder Lake-S PCH Serial IO UART #0",vendor="Intel Corporation"} 1
pci_device_count{class="Communication controller",driver="mei_me",product="Alder Lake-S PCH HECI Controller #1",vendor="Intel Corporation"} 1
pci_device_count{class="Display controller",driver="nvidia",product="GA104GL [RTX A4000]",vendor="NVIDIA Corporation"} 1
pci_device_count{class="Mass storage controller",driver="ahci",product="Alder Lake-S PCH SATA Controller [AHCI Mode]",vendor="Intel Corporation"} 1
pci_device_count{class="Mass storage controller",driver="nvme",product="IX SN530 NVMe SSD (DRAM-less)",vendor="Sandisk Corp"} 3
pci_device_count{class="Memory controller",driver="",product="Alder Lake-S PCH Shared SRAM",vendor="Intel Corporation"} 1
pci_device_count{class="Multimedia controller",driver="",product="Alder Lake-S HD Audio Controller",vendor="Intel Corporation"} 1
pci_device_count{class="Multimedia controller",driver="",product="GA104 High Definition Audio Controller",vendor="NVIDIA Corporation"} 1
pci_device_count{class="Network controller",driver="atemsys_pci",product="Ethernet Connection (17) I219-LM",vendor="Intel Corporation"} 1
pci_device_count{class="Network controller",driver="igb",product="I210 Gigabit Network Connection",vendor="Intel Corporation"} 1
pci_device_count{class="Network controller",driver="intel-eth-pci",product="0x7aac",vendor="Intel Corporation"} 1
pci_device_count{class="Network controller",driver="intel-eth-pci",product="0x7aad",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="",product="Alder Lake-S PCH SPI Controller",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="",product="Alder Lake-S PCH Serial IO I2C Controller #0",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="",product="Alder Lake-S PCH Serial IO I2C Controller #1",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="",product="Alder Lake-S PCH Serial IO I2C Controller #2",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="",product="Alder Lake-S PCH Serial IO I2C Controller #3",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="",product="Alder Lake-S PCH Serial IO SPI Controller #1",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="i801_smbus",product="Alder Lake-S PCH SMBus Controller",vendor="Intel Corporation"} 1
pci_device_count{class="Serial bus controller",driver="xhci_hcd",product="Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller",vendor="Intel Corporation"} 1

drigz added 2 commits July 16, 2025 12:39
This exports a list of connected PCI devices, so that we can track which
GPUs and NICs are installed on the fleet. node-exporter doesn't support
anything quite like this, the ethtool metric doesn't handle NICs that
aren't bound to a kernel driver for example, nor can it identify the
attached GPU.

The new binary is 9MB, and the new process has 21MB RSS, which seems to
mostly be TLS/protobuf dependencies that I guess come in via the
OpenCensus libraries, but could also come in via the PCI library, as
that has the ability to fetch a PCI ID database from the internet if
enabled at runtime (it is not enabled in this binary).
@drigz drigz requested a review from Ongy July 16, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant