Releases: jmuehlig/perf-cpp
Releases · jmuehlig/perf-cpp
v0.12.2
- Metric Functions: Metrics now support built-in functions such as
ratio(A, B)
andsum(A, B, C, ...)
, enabling more expressive and reusable formulas (see the documentation). - Optimized Compile-time Event Injection: The generated runtime event registration class is now only created if it does not already exist, reducing unnecessary recompilation.
- Improved Live Event Accuracy: Live event values now account for partial runtime durations via time scaling, improving accuracy when counters were not active for the full measurement window.
v0.12.1
- Automatic Event Discovery on ARM: Hardware event types are now automatically detected on ARM architectures when initializing a
perf::CounterDefinition
instance. - Hardware Counter Introspection: The number of available physical performance counters per logical core, along with the number of events each counter can multiplex, is now determined automatically when creating a
perf::EventCounter
. - Recursive and Scientific Metrics: Metric expressions can now reference other metrics recursively. Support for scientific notation (e.g.,
1e5
) in formula-based metrics has also been added.
v0.12.0
This release expands symbolic analysis capabilities, introduces FlameGraph generation, and improves hardware event management through both runtime and compile-time support.
- Symbol Resolution: Instruction pointers captured during sampling can now be resolved to function names using
perf::SymbolResolver
(see the documentation). - FlameGraph Export: Sampling data can be converted into formats compatible with visualization tools such as Brendan Gregg's FlameGraph, Speedscope, and flamegraph.com using
perf::analyzer::FlameGraphGenerator
(see the documentation). - Built-in Event Definitions: A set of
x86
-specific hardware events is now bundled in events/x86 and can be loaded at runtime usingperf::CounterDefinition
. This serves as an alternative to themake perf-list
target. - Compile-time Event Injection: Processor-specific event definitions can now be embedded directly at build time by configuring CMake with
-DGEN_PROCESSOR_EVENTS=1
. These are immediately available viaperf::CounterDefinition
(see the documentation). - Automatic Event Discovery: Additional event types–including RAPL energy counters and AMD IO MMU events–are now automatically detected during the creation of a
perf::CounterDefinition
instance (issue #6).
v0.11.1
v0.11.0
This version rolls out a redesigned sampling API.
Recorded data are now grouped into dedicated sub-structures (such as Metadata
, InstructionExecution
, and DataAccess
) inside perf::Sample
(see the sampling documentation).
The previous flat API is still available but deprecated and will be removed in v0.12
.
- New Sampling Interface: Work with clearly separated sample sections, exposing additional AMD IBS fields that are not surfaced by the
perf_event_open
records. - Explicit Latency Attributes: Vendor-specific latency signals–cache-access on Intel and cache-miss on AMD–are now surfaced as distinct fields.
- Heterogeneous-core Support: Sampling can target multiple PMU domains (e.g., cpu_core and cpu_atom) on hybrid Intel processors.
v0.10.0
- New feature: The auxiliary event is added automatically if required by the (Intel-) hardware (see the documentation).
- New feature: The Memory Access Analyzer allows the description of complex data objects and maps sampled memory addresses in order to report latency and access information (see the documentation).
- The number of pages for the sampling buffer is now aligned automatically if the number is not configured properly, i.e., a power of two plus one page for the header.
- New feature: Copy sampled data from the mmap-ed perf buffer into the application-level buffer whenever the buffer comes close to full (see the documentation).
v0.9.0
- Removed deprecated warnings about the sampling interface (and the old sampling interface).
- New feature: Access interim results from counters without stopping the counter using live counters.
- New feature: Sampling the user stack.
- New feature: Create custom metrics using expressions, e.g.,
"instructions/cycles"
. - New feature: Use metrics when sampling counter values.
- New feature: Control scheduling of events to physical hardware counters.
- New feature: Added time events (e.g.,
seconds
,milliseconds
, etc.) as virtual counters.