Skip to content

Handle thermal_zone errors gracefully #2980

@ghost

Description

Host operating system:

Linux 5.10.104-tegra #18 SMP PREEMPT aarch64 aarch64 aarch64 GNU/Linux

node_exporter version:

1.7.0

node_exporter command line flags:

--path.rootfs=/host

node_exporter log output

...
caller=collector.go:169 level=error msg="collector failed" name=thermal_zone duration_seconds=0.01870677 err="read /sys/class/thermal/thermal_zone10/temp: invalid argument"
caller=collector.go:169 level=error msg="collector failed" name=thermal_zone duration_seconds=0.001411717 err="read /sys/class/thermal/thermal_zone10/temp: invalid argument"
...

Are you running node_exporter in Docker?

Yes

What did you do that produced an error?

Running node_exporter in a docker container on a custom embedded device.

What did you expect to see?

Disabled thermal zones as either being ignored or optionally being filtered out.

What did you see instead?

The entire thermal_zone collector fails for all thermal_zones.

When a thermal zone is disabled which can be determined via /sys/class/thermal/thermal_zone10/mode, it would be nice for node_exporter to handle it gracefully whether natively or via flag, or allow specific files/devices be filtered out manually instead of as an entire class of devices.

My temporry workaround has been to use the Pushgateway with a curl container in my docker compose file as so:

  pushgateway:
    image: prom/pushgateway
    container_name: pushgateway
    restart: unless-stopped
    networks:
      - metrics
  curl_thermals:
    image: curlimages/curl
    container_name: curl_thermals
    command: '/bin/sh /pushgateway-thermal-zones.sh'
    pid: host
    restart: unless-stopped
    volumes:
      - /:/host:ro,rslave
      - ./pushgateway-thermal-zones.sh:/pushgateway-thermal-zones.sh:ro,rslave
    networks:
      - metrics

With this pushgateway-thermal-zones.sh script:

while true
do 
    output="# TYPE thermal_zone gauge\n# HELP thermal_zone Thermal zone temperatures in Celsius\n"

    # Loop through each thermal zone directory in /host/sys/class/thermal
    for zone in /host/sys/class/thermal/thermal_zone*; do
        # Check if the thermal zone is enabled by reading the mode file
        mode=$(cat "${zone}/mode")
        if [ "${mode}" = "enabled" ]; then
            zone_number=$(basename ${zone} | sed 's/thermal_zone//')
            zone_type=$(cat "${zone}/type")
            zone_temp=$(cat "${zone}/temp")
            zone_temp_scaled=$(echo "scale=2; ${zone_temp} / 1000.0" | bc)

            # Append the details to the output variable
            output="${output}thermal_zone{zone=\"${zone_number}\", type=\"${zone_type}\"} ${zone_temp_scaled}\n"
        fi
    done

    echo -e $output | curl -s --data-binary @- http://pushgateway:9091/metrics/job/thermal_zones/
    sleep 3
done

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions