Skip to content

KAFKA-20664: Clarify docs on max compaction lag, segment.ms, and segment.bytes for active segment rolling#22489

Merged
mjsax merged 3 commits into
apache:trunkfrom
alanlau28:KAFKA-20664
Jun 19, 2026
Merged

KAFKA-20664: Clarify docs on max compaction lag, segment.ms, and segment.bytes for active segment rolling#22489
mjsax merged 3 commits into
apache:trunkfrom
alanlau28:KAFKA-20664

Conversation

@alanlau28

@alanlau28 alanlau28 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Improves the documentation for segment.bytes, segment.ms, and
max.compaction.lag.ms with respect to active segment rolling.

Reviewers: Lucy Liu lucliu@confluent.io, Matthias J. Sax
matthias@confluent.io

@github-actions github-actions Bot added triage PRs from the community clients small Small PRs labels Jun 5, 2026
@alanlau28 alanlau28 changed the title KAFKA-20664: Improve docs about "max compaction lag", "segment.ms" and "segment.size" with regard to active segment rolling KAFKA-20664: Improve docs about "max compaction lag", "segment.ms" and "segment.bytes" with regard to active segment rolling Jun 5, 2026
Comment thread docs/design/design.md Outdated
This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted. Note that this compaction deadline is not a hard guarantee since it is still subjected to the availability of log cleaner threads and the actual compaction time. You will want to monitor the uncleanable-partitions-count, max-clean-time-secs and max-compaction-delay-secs metrics.
This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted.

Because the active segment is never compacted (as noted above), records become eligible for compaction only through active segment rolling. For a compacted topic the active segment is rolled when the first of these is reached: it grows to segment.bytes, or its age reaches the smaller of segment.ms and max.compaction.lag.ms. So max.compaction.lag.ms governs two distinct things. First, it triggers active segment rolling by lowering the effective time-based roll deadline to the smaller of segment.ms and max.compaction.lag.ms, moving older records out of the active segment. This active segment rolling is evaluated when records are appended, so a partition that has stopped receiving writes will not roll its active segment until the next append. Second, max.compaction.lag.ms then makes the rolled records eligible for compaction even when the log does not exceed min.cleanable.dirty.ratio.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This active segment rolling is evaluated when records are appended

The size/time roll check in maybeRoll is indeed only invoked from the append path, but the active segment can still be rolled from the retention path without an append. Illustrating all these situations might be too much for the doc

We could consider dropping the sentence, or narrowing it to the operator-facing consequence, e.g. "lowering max.compaction.lag.ms won't force-roll an idle partition; a new produce is needed before the dirty records become eligible for compaction."

"ineligible for compaction in the log. Only applicable for logs that are being compacted.";
"ineligible for compaction in the log. Only applicable for logs that are being compacted. " +
"Because the active segment is never compacted, for compacted topics this value also drives " +
"active segment rolling: the effective time-based roll deadline is the smaller of segment.ms " +

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the effective time-based roll deadline is
"threshold" might be more suitable than "deadline" here

Also, could you set each config name wrapped in tags, like <code>segment.ms</code>

@lucliu1108 lucliu1108 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanlau28 Thanks for the PR! overall LGTM

@github-actions github-actions Bot removed the triage PRs from the community label Jun 6, 2026
Comment thread docs/design/design.md Outdated
This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted. Note that this compaction deadline is not a hard guarantee since it is still subjected to the availability of log cleaner threads and the actual compaction time. You will want to monitor the uncleanable-partitions-count, max-clean-time-secs and max-compaction-delay-secs metrics.
This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted.

Because the active segment is never compacted (as noted above), records become eligible for compaction only through active segment rolling. For a compacted topic the active segment is rolled when the first of these is reached: it grows to segment.bytes, or its age reaches the smaller of segment.ms and max.compaction.lag.ms. So max.compaction.lag.ms governs two distinct things. First, it triggers active segment rolling by lowering the effective time-based roll threshold to the smaller of segment.ms and max.compaction.lag.ms, moving older records out of the active segment. Second, max.compaction.lag.ms then makes the rolled records eligible for compaction even when the log does not exceed min.cleanable.dirty.ratio.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For the last sentence, could change to:

even when the log's dirty ratio is below min.cleanable.dirty.ratio

@lucliu1108 lucliu1108 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mjsax mjsax added docs core Kafka Broker and removed clients labels Jun 9, 2026
@alanlau28 alanlau28 changed the title KAFKA-20664: Improve docs about "max compaction lag", "segment.ms" and "segment.bytes" with regard to active segment rolling KAFKA-20664: Clarify docs on max compaction lag, segment.ms, and segment.bytes for active segment rolling Jun 10, 2026
@mjsax mjsax merged commit cd5ce52 into apache:trunk Jun 19, 2026
28 checks passed
@mjsax

mjsax commented Jun 19, 2026

Copy link
Copy Markdown
Member

Thanks for the PR. Merged to trunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants