Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify profile stack trace representation #615

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

felixge
Copy link
Member

@felixge felixge commented Jan 9, 2025

  • Introduce a first-class Stack message type and lookup table.
  • Replace location index range based stack trace encoding on Sample with a single stack_index reference.
  • Remove the location_indices lookup table.

The primary motivation is laying the ground work for timestamp based profiling where the same stack trace needs to be referenced much more frequently compared to aggregation based on low
cardinality attributes.

Timestamp based profiling is also expected to be used with the upcoming Off-CPU profiling feature in the eBPF profiler. Off-CPU stack traces have a different distribution compared to CPU stack traces. In particular Off-CPU stack traces are much more repetitive because they typically occur at special leaf functions such as leaf (async preemption being a notable exception). For the same reason it is also uncommon to see a stack trace that are a root-prefix of a previously observed stack trace.

We might need to revisit the previous previous benchmarks to confirm these claims, as the previous analysis seems to have shown that the location range based encoding is always either comparable or better than the mechanism proposed here.

The secondary motivation is simplicitly. Arguably the proposed change here will make it easier to write exporters, processors as well as receivers.

It seems like we had rough consensus around this change in previous SIG meetings, and it seems like a good incremental step to make progress on the timestamp proposal.

- Introduce a first-class Stack message type and lookup table.
- Replace location index range based stack trace encoding on Sample with
  a single stack_index reference.
- Remove the location_indices lookup table.

The primary motivation is laying the ground work for [timestamp based
profiling][timestamp proposal] where the same stack trace needs to be
referenced much more frequently compared to aggregation based on low
cardinality attributes.

Timestamp based profiling is also expected to be used with the the
upcoming [Off-CPU profiling][off-cpu pr] feature in the eBPF profiler.
Off-CPU stack traces have a different distribution compared to CPU
samples. In particular stack traces are much more repetitive because
they only occur at call sites such as syscalls. For the same reason it
is also uncommon to see a stack trace are a root-prefix of a previously
observed stack trace.

We might need to revisit the previous [previous benchmarks][benchmarks]
to confirm these claims.

The secondary motivation is simplicitly. Arguably the proposed change
here will make it easier to write exporters, processors as well as
receivers.

It seems like we had rough consensus around this change in previous SIG
meetings, and it seems like a good incremental step to make progress on
the timestamp proposal.

[timestamp proposal]: #594
[off-cpu pr]: open-telemetry/opentelemetry-ebpf-profiler#196
[benchmarks]: https://docs.google.com/spreadsheets/d/1Q-6MlegV8xLYdz5WD5iPxQU2tsfodX1-CDV1WeGzyQ0/edit?gid=2069300294#gid=2069300294
// Supersedes location_index.
int32 locations_length = 2;
// Reference to stack in Profile.stack_table.
int32 stack_index = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have an optional int64 instruction_ip property here that would take the common stack at stack_index, but also add the instruction_ip to it. This would allow for common on-CPU callstacks to be respresented cheaply that are in the same function but hit different parts of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be just int64 since that won't play well with symbolization and inlines, would have to be a location ID.

Also, this optimization would only be most useful to profile types like CPU sampling where the leaf frame is indeed very random. For profiles like heap profiling it would not be as valuable since there even the leaf is a callsite, not a sampled IP.

And if we are looking at such "encode the tree more efficiently" optimizations then we should probably look at using the actual tree encoding as discussed a bit here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it's only useful for on-CPU sampling with skids, etc. So I don't think it should be required, but definitely useful when needed as an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants