Skip to content

Commit 3755709

Browse files
committed
docs: add per-collector metric documentation
New documentation structure under docs/collectors/: - Template for documenting additional collectors - cpu: metrics, labels, configuration flags, data sources - cpufreq: frequency scaling metrics with kernel docs links - diskstats: I/O statistics with device filtering options - meminfo: memory statistics with field mappings - netstat: network statistics with protocol breakdowns Each collector doc includes: - Supported platforms - Configuration flags (where applicable) - Data sources with kernel documentation links - Metrics table with types, labels, descriptions Signed-off-by: Willian Paixao <willian@ufpa.br>
1 parent f09a706 commit 3755709

File tree

8 files changed

+493
-0
lines changed

8 files changed

+493
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,8 @@ On some systems, the `timex` collector requires an additional Docker flag,
7676
There is varying support for collectors on each operating system. The tables
7777
below list all existing collectors and the supported systems.
7878

79+
For detailed per-collector documentation including metrics, labels, and configuration flags, see [docs/collectors/](./docs/collectors/). Currently documented: [cpu](./docs/collectors/cpu.md), [cpufreq](./docs/collectors/cpufreq.md), [diskstats](./docs/collectors/diskstats.md), [meminfo](./docs/collectors/meminfo.md), [netstat](./docs/collectors/netstat.md).
80+
7981
Collectors are enabled by providing a `--collector.<name>` flag.
8082
Collectors that are enabled by default can be disabled by providing a `--no-collector.<name>` flag.
8183
To enable only some specific collector(s), use `--collector.disable-defaults --collector.<name> ...`.

docs/collectors/README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Collector Documentation
2+
3+
Per-collector metric documentation. Each file documents one collector.
4+
5+
## Available Documentation
6+
7+
- [cpu](cpu.md) - CPU time statistics and metadata
8+
- [cpufreq](cpufreq.md) - CPU frequency scaling statistics
9+
- [diskstats](diskstats.md) - Disk I/O statistics
10+
- [meminfo](meminfo.md) - Memory statistics
11+
- [netstat](netstat.md) - Network statistics
12+
13+
## Structure
14+
15+
See [_TEMPLATE.md](_TEMPLATE.md) for the documentation template.
16+
17+
## Naming
18+
19+
Files are named `<collector_name>.md` matching the collector registration name (e.g., `cpu.md`, `filesystem.md`).
20+
21+
## Contributing
22+
23+
When adding or modifying a collector:
24+
1. Update or create the corresponding documentation file
25+
2. Ensure all metrics are listed with correct types and labels
26+
3. Document any configuration flags

docs/collectors/_TEMPLATE.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# collector_name
2+
3+
Brief description of what this collector exposes.
4+
5+
Status: enabled|disabled by default
6+
7+
## Platforms
8+
9+
- Linux
10+
- Darwin
11+
- FreeBSD
12+
- ...
13+
14+
## Configuration
15+
16+
```
17+
--collector.name.flag-name Description (default: value)
18+
--collector.name.other-flag Description (default: value)
19+
```
20+
21+
Omit this section if the collector has no flags.
22+
23+
## Data Sources
24+
25+
| Source | Description |
26+
|--------|-------------|
27+
| `/proc/example` | Brief description |
28+
| `/sys/class/example` | Brief description |
29+
| `syscall(2)` | Brief description |
30+
31+
## Metrics
32+
33+
| Metric | Type | Labels | Description |
34+
|--------|------|--------|-------------|
35+
| `node_example_total` | counter | `label1`, `label2` | Description |
36+
| `node_example_bytes` | gauge | | Description |
37+
| `node_example_info` | gauge | `key`, `value` | Info metric, always 1 |
38+
39+
For collectors with dynamic metrics (e.g., meminfo), use:
40+
41+
Metrics are derived from `/proc/meminfo`. Each field `FieldName` becomes `node_memory_fieldname_bytes`.
42+
43+
## Labels
44+
45+
| Label | Description |
46+
|-------|-------------|
47+
| `device` | Device name |
48+
| `mountpoint` | Mount path |
49+
50+
Omit this section if metrics have no labels or labels are self-explanatory.
51+
52+
## Notes
53+
54+
- Special behaviors, caveats, kernel version requirements
55+
- Known issues or limitations
56+
- Related collectors
57+
58+
Omit this section if not applicable.

docs/collectors/cpu.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# cpu
2+
3+
Exposes CPU time statistics from `/proc/stat` and CPU metadata from `/proc/cpuinfo` and sysfs.
4+
5+
Status: enabled by default
6+
7+
## Platforms
8+
9+
- Linux
10+
- Darwin
11+
- Dragonfly
12+
- FreeBSD
13+
- NetBSD
14+
- OpenBSD
15+
- Solaris
16+
- AIX
17+
18+
## Configuration
19+
20+
```
21+
--collector.cpu.guest Enable node_cpu_guest_seconds_total metric (default: true)
22+
--collector.cpu.info Enable node_cpu_info metric (default: false)
23+
--collector.cpu.info.flags-include Regex filter for CPU flags to include in node_cpu_flag_info
24+
--collector.cpu.info.bugs-include Regex filter for CPU bugs to include in node_cpu_bug_info
25+
```
26+
27+
Setting `--collector.cpu.info.flags-include` or `--collector.cpu.info.bugs-include` implicitly enables `--collector.cpu.info`.
28+
29+
## Data Sources
30+
31+
| Source | Description |
32+
|--------|-------------|
33+
| `/proc/stat` | CPU time counters per core and mode |
34+
| `/proc/cpuinfo` | CPU metadata (vendor, model, flags, bugs) |
35+
| `/sys/devices/system/cpu/cpu*/topology/` | Physical package and core IDs |
36+
| `/sys/devices/system/cpu/cpu*/thermal_throttle/` | Thermal throttling counters |
37+
| `/sys/devices/system/cpu/cpu*/online` | CPU online status |
38+
| `/sys/devices/system/cpu/isolated` | Isolated CPUs list |
39+
40+
## Metrics
41+
42+
| Metric | Type | Labels | Description |
43+
|--------|------|--------|-------------|
44+
| `node_cpu_seconds_total` | counter | `cpu`, `mode` | Seconds the CPUs spent in each mode |
45+
| `node_cpu_guest_seconds_total` | counter | `cpu`, `mode` | Seconds the CPUs spent in guest (VM) mode |
46+
| `node_cpu_info` | gauge | `package`, `core`, `cpu`, `vendor`, `family`, `model`, `model_name`, `microcode`, `stepping`, `cachesize` | CPU metadata, always 1 |
47+
| `node_cpu_frequency_hertz` | gauge | `package`, `core`, `cpu` | CPU frequency from /proc/cpuinfo (only when cpufreq collector disabled) |
48+
| `node_cpu_flag_info` | gauge | `flag` | CPU flag presence from first core, always 1 |
49+
| `node_cpu_bug_info` | gauge | `bug` | CPU bug presence from first core, always 1 |
50+
| `node_cpu_core_throttles_total` | counter | `package`, `core` | Thermal throttle events per core |
51+
| `node_cpu_package_throttles_total` | counter | `package` | Thermal throttle events per package |
52+
| `node_cpu_isolated` | gauge | `cpu` | CPU isolation status (1 if isolated) |
53+
| `node_cpu_online` | gauge | `cpu` | CPU online status (1 if online) |
54+
55+
## Labels
56+
57+
| Label | Description |
58+
|-------|-------------|
59+
| `cpu` | Logical CPU number (0-indexed) |
60+
| `mode` | CPU time mode: `user`, `nice`, `system`, `idle`, `iowait`, `irq`, `softirq`, `steal` |
61+
| `package` | Physical CPU package ID |
62+
| `core` | Physical core ID within package |
63+
64+
## Notes
65+
66+
- `node_cpu_guest_seconds_total` values are also included in `node_cpu_seconds_total` (user and nice modes)
67+
- Counter values may jump backwards on CPU hotplug events; the collector handles this by resetting stats when idle jumps back more than 3 seconds
68+
- `node_cpu_flag_info` and `node_cpu_bug_info` are only exposed from the first CPU core
69+
- `node_cpu_frequency_hertz` is only exposed when the `cpufreq` collector is disabled to avoid duplicate metrics
70+
- Linux-specific metrics: throttle counters, isolated, online status

docs/collectors/cpufreq.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# cpufreq
2+
3+
Exposes CPU frequency scaling statistics from sysfs.
4+
5+
Status: enabled by default
6+
7+
## Platforms
8+
9+
- Linux
10+
- Solaris
11+
12+
## Data Sources
13+
14+
| Source | Description |
15+
|--------|-------------|
16+
| `/sys/devices/system/cpu/cpu*/cpufreq/` | Per-CPU frequency scaling data |
17+
18+
Kernel documentation:
19+
- https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt
20+
- https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
21+
22+
## Metrics
23+
24+
| Metric | Type | Labels | Description |
25+
|--------|------|--------|-------------|
26+
| `node_cpu_frequency_hertz` | gauge | `cpu` | Current CPU thread frequency in hertz |
27+
| `node_cpu_frequency_min_hertz` | gauge | `cpu` | Minimum CPU thread frequency in hertz |
28+
| `node_cpu_frequency_max_hertz` | gauge | `cpu` | Maximum CPU thread frequency in hertz |
29+
| `node_cpu_scaling_frequency_hertz` | gauge | `cpu` | Current scaled CPU thread frequency in hertz |
30+
| `node_cpu_scaling_frequency_min_hertz` | gauge | `cpu` | Minimum scaled CPU thread frequency in hertz |
31+
| `node_cpu_scaling_frequency_max_hertz` | gauge | `cpu` | Maximum scaled CPU thread frequency in hertz |
32+
| `node_cpu_scaling_governor` | gauge | `cpu`, `governor` | Current CPU frequency governor (1 if active, 0 otherwise) |
33+
34+
## Labels
35+
36+
| Label | Description |
37+
|-------|-------------|
38+
| `cpu` | CPU name from sysfs (e.g., `cpu0`) |
39+
| `governor` | Frequency governor name (e.g., `performance`, `powersave`, `ondemand`) |
40+
41+
## Notes
42+
43+
- Sysfs values are in kHz; the collector converts to Hz
44+
- `cpuinfo_*` metrics reflect hardware limits; `scaling_*` metrics reflect current governor policy limits
45+
- `node_cpu_scaling_governor` emits one metric per available governor per CPU, with value 1 for the active governor
46+
- When this collector is enabled, the `cpu` collector does not expose `node_cpu_frequency_hertz` to avoid duplication

docs/collectors/diskstats.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# diskstats
2+
3+
Exposes disk I/O statistics from `/proc/diskstats` and block device metadata from sysfs and udev.
4+
5+
Status: enabled by default
6+
7+
## Platforms
8+
9+
- Linux
10+
- Darwin
11+
- OpenBSD
12+
- AIX
13+
14+
## Configuration
15+
16+
```
17+
--collector.diskstats.device-include Regexp of devices to include (mutually exclusive with device-exclude)
18+
--collector.diskstats.device-exclude Regexp of devices to exclude (default: ^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$)
19+
```
20+
21+
## Data Sources
22+
23+
| Source | Description |
24+
|--------|-------------|
25+
| `/proc/diskstats` | Disk I/O statistics |
26+
| `/sys/block/<device>/` | Block device attributes |
27+
| `/sys/block/<device>/queue/` | Block device queue stats |
28+
| `/run/udev/data/b<major>:<minor>` | Udev device properties |
29+
30+
Kernel documentation: https://www.kernel.org/doc/Documentation/iostats.txt
31+
32+
## Metrics
33+
34+
### I/O Statistics
35+
36+
| Metric | Type | Labels | Description |
37+
|--------|------|--------|-------------|
38+
| `node_disk_reads_completed_total` | counter | `device` | Total number of reads completed successfully |
39+
| `node_disk_reads_merged_total` | counter | `device` | Total number of reads merged |
40+
| `node_disk_read_bytes_total` | counter | `device` | Total number of bytes read successfully |
41+
| `node_disk_read_time_seconds_total` | counter | `device` | Total seconds spent by all reads |
42+
| `node_disk_writes_completed_total` | counter | `device` | Total number of writes completed successfully |
43+
| `node_disk_writes_merged_total` | counter | `device` | Total number of writes merged |
44+
| `node_disk_written_bytes_total` | counter | `device` | Total number of bytes written successfully |
45+
| `node_disk_write_time_seconds_total` | counter | `device` | Total seconds spent by all writes |
46+
| `node_disk_io_now` | gauge | `device` | Number of I/Os currently in progress |
47+
| `node_disk_io_time_seconds_total` | counter | `device` | Total seconds spent doing I/Os |
48+
| `node_disk_io_time_weighted_seconds_total` | counter | `device` | Weighted seconds spent doing I/Os |
49+
50+
### Discard Statistics (Linux 4.18+)
51+
52+
| Metric | Type | Labels | Description |
53+
|--------|------|--------|-------------|
54+
| `node_disk_discards_completed_total` | counter | `device` | Total number of discards completed successfully |
55+
| `node_disk_discards_merged_total` | counter | `device` | Total number of discards merged |
56+
| `node_disk_discarded_sectors_total` | counter | `device` | Total number of sectors discarded successfully |
57+
| `node_disk_discard_time_seconds_total` | counter | `device` | Total seconds spent by all discards |
58+
59+
### Flush Statistics (Linux 5.5+)
60+
61+
| Metric | Type | Labels | Description |
62+
|--------|------|--------|-------------|
63+
| `node_disk_flush_requests_total` | counter | `device` | Total number of flush requests completed successfully |
64+
| `node_disk_flush_requests_time_seconds_total` | counter | `device` | Total seconds spent by all flush requests |
65+
66+
### Device Info
67+
68+
| Metric | Type | Labels | Description |
69+
|--------|------|--------|-------------|
70+
| `node_disk_info` | gauge | `device`, `major`, `minor`, `path`, `wwn`, `model`, `serial`, `revision`, `rotational` | Block device info, always 1 |
71+
| `node_disk_filesystem_info` | gauge | `device`, `type`, `usage`, `uuid`, `version` | Filesystem info from udev, always 1 |
72+
| `node_disk_device_mapper_info` | gauge | `device`, `name`, `uuid`, `vg_name`, `lv_name`, `lv_layer` | Device mapper info, always 1 |
73+
74+
### ATA Device Attributes
75+
76+
| Metric | Type | Labels | Description |
77+
|--------|------|--------|-------------|
78+
| `node_disk_ata_write_cache` | gauge | `device` | ATA disk has a write cache (1 if true) |
79+
| `node_disk_ata_write_cache_enabled` | gauge | `device` | ATA disk write cache is enabled (1 if true) |
80+
| `node_disk_ata_rotation_rate_rpm` | gauge | `device` | ATA disk rotation rate in RPM (0 for SSDs) |
81+
82+
## Notes
83+
84+
- Sector sizes in `/proc/diskstats` are always 512 bytes regardless of actual device sector size
85+
- Time values in the kernel are in milliseconds; the collector converts to seconds
86+
- Udev info metrics require readable `/run/udev/data/` directory
87+
- Discard and flush metrics availability depends on kernel version
88+
- The default exclude pattern filters out partition devices and RAM/loop devices

docs/collectors/meminfo.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# meminfo
2+
3+
Exposes memory statistics from `/proc/meminfo`.
4+
5+
Status: enabled by default
6+
7+
## Platforms
8+
9+
- Linux
10+
- Darwin
11+
- OpenBSD
12+
- NetBSD
13+
- AIX
14+
15+
## Data Sources
16+
17+
| Source | Description |
18+
|--------|-------------|
19+
| `/proc/meminfo` | Memory statistics |
20+
21+
Kernel documentation: https://www.kernel.org/doc/Documentation/filesystems/proc.txt (search for "meminfo")
22+
23+
## Metrics
24+
25+
Metrics are dynamically generated from `/proc/meminfo` fields. Each field `FieldName` with value in kB becomes `node_memory_FieldName_bytes` (converted to bytes).
26+
27+
### Common Metrics
28+
29+
| Metric | Type | Description |
30+
|--------|------|-------------|
31+
| `node_memory_MemTotal_bytes` | gauge | Total usable RAM |
32+
| `node_memory_MemFree_bytes` | gauge | Free RAM |
33+
| `node_memory_MemAvailable_bytes` | gauge | Available memory for starting new applications |
34+
| `node_memory_Buffers_bytes` | gauge | Memory used by kernel buffers |
35+
| `node_memory_Cached_bytes` | gauge | Memory used by page cache and slabs |
36+
| `node_memory_SwapTotal_bytes` | gauge | Total swap space |
37+
| `node_memory_SwapFree_bytes` | gauge | Free swap space |
38+
| `node_memory_SwapCached_bytes` | gauge | Swap space cached in RAM |
39+
40+
### Active/Inactive Memory
41+
42+
| Metric | Type | Description |
43+
|--------|------|-------------|
44+
| `node_memory_Active_bytes` | gauge | Memory recently used |
45+
| `node_memory_Inactive_bytes` | gauge | Memory not recently used |
46+
| `node_memory_Active_anon_bytes` | gauge | Active anonymous memory |
47+
| `node_memory_Inactive_anon_bytes` | gauge | Inactive anonymous memory |
48+
| `node_memory_Active_file_bytes` | gauge | Active file-backed memory |
49+
| `node_memory_Inactive_file_bytes` | gauge | Inactive file-backed memory |
50+
51+
### Slab Memory
52+
53+
| Metric | Type | Description |
54+
|--------|------|-------------|
55+
| `node_memory_Slab_bytes` | gauge | Kernel slab memory |
56+
| `node_memory_SReclaimable_bytes` | gauge | Reclaimable slab memory |
57+
| `node_memory_SUnreclaim_bytes` | gauge | Unreclaimable slab memory |
58+
59+
### Huge Pages
60+
61+
| Metric | Type | Description |
62+
|--------|------|-------------|
63+
| `node_memory_HugePages_Total` | gauge | Total huge pages (count, not bytes) |
64+
| `node_memory_HugePages_Free` | gauge | Free huge pages (count) |
65+
| `node_memory_HugePages_Rsvd` | gauge | Reserved huge pages (count) |
66+
| `node_memory_HugePages_Surp` | gauge | Surplus huge pages (count) |
67+
| `node_memory_Hugepagesize_bytes` | gauge | Size of each huge page |
68+
69+
### Virtual Memory
70+
71+
| Metric | Type | Description |
72+
|--------|------|-------------|
73+
| `node_memory_VmallocTotal_bytes` | gauge | Total vmalloc address space |
74+
| `node_memory_VmallocUsed_bytes` | gauge | Used vmalloc address space |
75+
| `node_memory_VmallocChunk_bytes` | gauge | Largest contiguous vmalloc block |
76+
77+
### Other
78+
79+
| Metric | Type | Description |
80+
|--------|------|-------------|
81+
| `node_memory_Dirty_bytes` | gauge | Memory waiting to be written to disk |
82+
| `node_memory_Writeback_bytes` | gauge | Memory being written to disk |
83+
| `node_memory_Mapped_bytes` | gauge | Files mapped into memory |
84+
| `node_memory_Shmem_bytes` | gauge | Shared memory |
85+
| `node_memory_KernelStack_bytes` | gauge | Kernel stack memory |
86+
| `node_memory_PageTables_bytes` | gauge | Page table memory |
87+
| `node_memory_CommitLimit_bytes` | gauge | Total memory available for allocation |
88+
| `node_memory_Committed_AS_bytes` | gauge | Total memory allocated |
89+
90+
## Notes
91+
92+
- Available metrics vary by kernel version and configuration
93+
- `MemAvailable` requires Linux 3.14+
94+
- HugePages metrics are counts, not byte values
95+
- Metrics with `_total` suffix are exposed as counters; all others are gauges
96+
- Darwin, OpenBSD, NetBSD, and AIX have platform-specific implementations with different available metrics

0 commit comments

Comments
 (0)