Skip to content

Add APM Server known issue for TBS #4862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 8.x
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/en/observability/apm/known-issues.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,17 @@ _Versions: XX.XX.XX, YY.YY.YY, ZZ.ZZ.ZZ_
// If applicable, link to fix
////

[discrete]
== Tail Sampling may not compact / expired TTLs as quickly as desired, causing increased storage usage.

_Elastic Stack versions: 8.0.0+ < 9.0**_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer:

Suggested change
_Elastic Stack versions: 8.0.0+ < 9.0**_
_Elastic Stack versions: All 8.x versions_


There are some issues with the Tail Sampling implementation in versions 8.0.0+ < 9.0 that may cause the buffered traces to not be compacted or expired as quickly as desired. This can lead to increased storage usage for longer than the default 30m TTL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are some issues with the Tail Sampling implementation in versions 8.0.0+ < 9.0 that may cause the buffered traces to not be compacted or expired as quickly as desired. This can lead to increased storage usage for longer than the default 30m TTL.
There are some issues with the tail sampling implementation in all 8.x versions that may prevent buffered traces from being compacted or expired as quickly as expected. This can lead to increased storage usage for longer than the default 30m TTL.


This may manifest in two ways, increased value log (vlog) file size and increased SST (LSM) file size. LSM growth and late compaction is particularly troublesome given how the underlying K/V database performs compactions on its layers. There is noticeable LSM growth for use-cases where traces are under 1KB in size, since they are written to the LSM layer directly.

This issue is fixed in 9.0.0, due to a re-implementation of how the underlying tail sampling databases are used. The new implementation uses a more efficient partitioning scheme, allowing more efficient expiration of traces.

[discrete]
== APM Server v8.6.x and prior with Elasticsearch v8.15.x and later has broken APM UI

Expand Down