KAFKA-20664: Clarify docs on max compaction lag, segment.ms, and segment.bytes for active segment rolling#22489
Conversation
…and max.compaction.lag.ms
| This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted. Note that this compaction deadline is not a hard guarantee since it is still subjected to the availability of log cleaner threads and the actual compaction time. You will want to monitor the uncleanable-partitions-count, max-clean-time-secs and max-compaction-delay-secs metrics. | ||
| This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted. | ||
|
|
||
| Because the active segment is never compacted (as noted above), records become eligible for compaction only through active segment rolling. For a compacted topic the active segment is rolled when the first of these is reached: it grows to segment.bytes, or its age reaches the smaller of segment.ms and max.compaction.lag.ms. So max.compaction.lag.ms governs two distinct things. First, it triggers active segment rolling by lowering the effective time-based roll deadline to the smaller of segment.ms and max.compaction.lag.ms, moving older records out of the active segment. This active segment rolling is evaluated when records are appended, so a partition that has stopped receiving writes will not roll its active segment until the next append. Second, max.compaction.lag.ms then makes the rolled records eligible for compaction even when the log does not exceed min.cleanable.dirty.ratio. |
There was a problem hiding this comment.
This active segment rolling is evaluated when records are appended
The size/time roll check in maybeRoll is indeed only invoked from the append path, but the active segment can still be rolled from the retention path without an append. Illustrating all these situations might be too much for the doc
We could consider dropping the sentence, or narrowing it to the operator-facing consequence, e.g. "lowering max.compaction.lag.ms won't force-roll an idle partition; a new produce is needed before the dirty records become eligible for compaction."
| "ineligible for compaction in the log. Only applicable for logs that are being compacted."; | ||
| "ineligible for compaction in the log. Only applicable for logs that are being compacted. " + | ||
| "Because the active segment is never compacted, for compacted topics this value also drives " + | ||
| "active segment rolling: the effective time-based roll deadline is the smaller of segment.ms " + |
There was a problem hiding this comment.
the effective time-based roll deadline is
"threshold" might be more suitable than "deadline" here
Also, could you set each config name wrapped in tags, like <code>segment.ms</code>
lucliu1108
left a comment
There was a problem hiding this comment.
@alanlau28 Thanks for the PR! overall LGTM
| This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted. Note that this compaction deadline is not a hard guarantee since it is still subjected to the availability of log cleaner threads and the actual compaction time. You will want to monitor the uncleanable-partitions-count, max-clean-time-secs and max-compaction-delay-secs metrics. | ||
| This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted. | ||
|
|
||
| Because the active segment is never compacted (as noted above), records become eligible for compaction only through active segment rolling. For a compacted topic the active segment is rolled when the first of these is reached: it grows to segment.bytes, or its age reaches the smaller of segment.ms and max.compaction.lag.ms. So max.compaction.lag.ms governs two distinct things. First, it triggers active segment rolling by lowering the effective time-based roll threshold to the smaller of segment.ms and max.compaction.lag.ms, moving older records out of the active segment. Second, max.compaction.lag.ms then makes the rolled records eligible for compaction even when the log does not exceed min.cleanable.dirty.ratio. |
There was a problem hiding this comment.
Nit: For the last sentence, could change to:
even when the log's dirty ratio is below min.cleanable.dirty.ratio
|
Thanks for the PR. Merged to |
Improves the documentation for
segment.bytes,segment.ms, andmax.compaction.lag.mswith respect to active segment rolling.Reviewers: Lucy Liu lucliu@confluent.io, Matthias J. Sax
matthias@confluent.io