coordinated compaction improvements by bashtanov · Pull Request #29486 · redpanda-data/redpanda

bashtanov · 2026-02-02T15:00:35Z

https://redpandadata.atlassian.net/browse/CORE-15445

An update to group offsets may be lost due to leadership transfers.

We previously tolerated it: partitions are typically written into and compacted, so coordination offsets often move and a new update will get through.

However, it is difficult to test this way. Fix the functionality and the test.

Backports Required

Release Notes

none

Copilot

Pull request overview

This PR fixes a bug where group offset updates in coordinated compaction could be lost during leadership transfers. The core issue was that the compaction coordinator did not properly handle leadership changes, potentially losing MTRO/MXRO updates that were in flight when leadership transferred.

Changes:

Modified leadership change handling to track the leadership term and force replication of group offsets when leadership changes
Updated test to handle transient leadership transfer failures and retry replication/leadership operations
Removed the flaky = True flag from the test build configuration after fixing the underlying race condition

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/v/raft/compaction_coordinator.h	Added term tracking and force update flag for leadership changes
src/v/raft/compaction_coordinator.cc	Implemented term-based leadership tracking and forced offset distribution on leadership change
src/v/raft/consensus.cc	Updated leadership notification to pass term ID to compaction coordinator
src/v/raft/tests/coordinated_compaction_test.cc	Enhanced test robustness with retry logic for leadership transfers and replication
src/v/raft/tests/BUILD	Removed flaky test flag

Copilot · 2026-02-02T15:01:24Z

src/v/raft/compaction_coordinator.cc

+        // we will retry later anyway. Retries may be cancelled
+        // a) on leadership change, where the flag will be reset back to true;
+        // or
+        // b) on a further call to `on_group_offsets_update` due to


Corrected spelling of 'on_group_offsets_update' (which has been removed) to 'update_group_offsets' in the comment.

Suggested change

// b) on a further call to `on_group_offsets_update` due to

// b) on a further call to `update_group_offsets` due to

src/v/raft/tests/coordinated_compaction_test.cc

Copilot · 2026-02-02T15:01:24Z

src/v/raft/compaction_coordinator.cc

+      "compaction coordinator planning to distribute group offsets in {}",
+      group_offsets_send_delay);


The log message is missing a unit for the delay. Consider changing to 'planning to distribute group offsets in {} seconds' or using appropriate formatting for the duration.

Suggested change

"compaction coordinator planning to distribute group offsets in {}",

group_offsets_send_delay);

"compaction coordinator planning to distribute group offsets in {} ms",

std::chrono::duration_cast<std::chrono::milliseconds>(

group_offsets_send_delay)

.count());

fmt prints (or rather should print) durations in human-readable form

vbotbuildovich · 2026-02-02T16:28:10Z

CI test results

test results on build#79952

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/79952#019c1efa-b4dd-438f-9021-44504ffa0e92	FLAKY	17/21	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0920, p0=0.2777, reject_threshold=0.0100. adj_baseline=0.2514, p1=0.2210, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

test results on build#79978

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
QuotaManagementUpgradeTest	test_upgrade	null	integration	https://buildkite.com/redpanda/redpanda/builds/79978#019c2071-3851-442e-8d2a-b214023a8c78	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0632, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1778, p1=0.1413, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=QuotaManagementUpgradeTest&test_method=test_upgrade

test results on build#80040

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ConsumerGroupBalancingTest	test_coordinator_nodes_balance	null	integration	https://buildkite.com/redpanda/redpanda/builds/80040#019c2576-fee5-4d6e-982f-144b23009242	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0049, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerGroupBalancingTest&test_method=test_coordinator_nodes_balance
RedpandaNodeOperationsSmokeTest	test_node_ops_smoke_test	{"cloud_storage_type": 1, "mixed_versions": true}	integration	https://buildkite.com/redpanda/redpanda/builds/80040#019c252d-3f4a-423e-9fee-0b11ae2e87b0	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0057, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
ScalingUpTest	test_scaling_up_with_recovered_topic	null	integration	https://buildkite.com/redpanda/redpanda/builds/80040#019c2576-fee5-4d6e-982f-144b23009242	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0157, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_scaling_up_with_recovered_topic
WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/80040#019c2576-fee5-4d6e-982f-144b23009242	FLAKY	9/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.1005, p0=0.6532, reject_threshold=0.0100. adj_baseline=0.2722, p1=0.1977, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

joe-redpanda · 2026-02-02T20:55:06Z

src/v/raft/compaction_coordinator.cc

-void compaction_coordinator::on_group_offsets_update() {
-    if (_is_leader) {
+    if (
+      _leader_term_id && (mtro_updated || mxro_updated || _need_force_update)) {


How could this get called with the right branch of the 'and' false
can this be called with mtro_updated false, mxro updated false, and need_force_update false?

E.g. in the following situation. We are the leader and we have been the one in the current term for a while. The timer triggered collect_all_replica_offsets, it called update_local_replica_offsets. Local offsets have increased, so it called recalculate_group_offsets. However, the local offsets were not the lowest, as some replica is holding behind the compaction. new_mtro and new_mxro remained the same. update_group_offsets is called with the same values as at some point before. mtro_updated and mxro_updated are both false. _need_force_update is false as well, as it is not the first update in the current term.

joe-redpanda · 2026-02-02T21:00:47Z

src/v/raft/tests/coordinated_compaction_test.cc

-            auto last_data_offset = r.value().last_offset;
-            last_data_offsets.push_back(last_data_offset);
+            model::offset last_data_offset;
+            while (true) {


nit: while true without a loop-global timeout makes me nervous

I'll change it into a for (int attempt = 0; attempt < 5; ++attempt)

joe-redpanda · 2026-02-02T21:01:43Z

src/v/raft/tests/coordinated_compaction_test.cc

                        node_id,
                        committed_offset);
-                      return committed_offset == last_data_offset;
+                      return committed_offset >= last_data_offset;


What is the reason behind this one?

Due to a leadership transfer a configuration batch may have been replicated after the data.

joe-redpanda

Some questions
some nits
globally looks good

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

src/v/raft/tests/coordinated_compaction_test.cc

Copilot · 2026-02-02T22:03:29Z

src/v/raft/tests/coordinated_compaction_test.cc

-            ASSERT_FALSE_CORO(r.has_error());
-            auto last_data_offset = r.value().last_offset;
-            last_data_offsets.push_back(last_data_offset);
+            model::offset last_data_offset;


The variable last_data_offset is uninitialized and will remain uninitialized if all 5 retry attempts fail (line 55-70). After the loop completes, it's used unconditionally at lines 71-75 and 99, which will result in undefined behavior. Initialize it to a sentinel value or add an assertion after the retry loop to ensure it was set.

src/v/raft/compaction_coordinator.cc

src/v/raft/compaction_coordinator.h

bharathv · 2026-02-02T23:28:20Z

src/v/raft/compaction_coordinator.cc

+            _need_force_update = true;
        }
+        _leader_term_id = {new_term};
+        arm_timer_if_needed(true);


is there a risk of arming the timer twice if the same replica is elected leader back to back?

src/v/raft/compaction_coordinator.cc

An update to group offsets may be lost due to leadership transfers. We would previously tolerate it: partitions are typically written into and compacted, so coordination offsets often move and a new update will get through. Change it so that such updates are never lost: - Store current leadership term. - In each term leader to distribute group offsets to followers regardless whether they changed. - When replica offsets are updated with group updates do not recalculate group updates. This is to avoid pointless updates.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Copilot · 2026-02-03T11:20:04Z

src/v/raft/compaction_coordinator.cc

 void compaction_coordinator::recalculate_group_offsets() {
-    vassert(
-      _is_leader, "only leader can recalculate max tombstone remove offset");
+    vassert(_leader_term_id, "only leader can recalculate group offsets");


Using _leader_term_id (an std::optional<model::term_id>) directly in a boolean context is incorrect. The assertion should check _leader_term_id.has_value() to properly verify whether there is a valid leader term ID.

Suggested change

vassert(_leader_term_id, "only leader can recalculate group offsets");

vassert(

_leader_term_id.has_value(),

"only leader can recalculate group offsets");

no it's the same

src/v/raft/tests/coordinated_compaction_test.cc

Make resilient to leader transfers which may happen under load: - increase timeouts; - check for current leader; - retry on errc::not_leader; - allow for configuration batches offsets. Also add a test for no lost updates after a leadership transfer.

vbotbuildovich · 2026-02-04T09:39:48Z

/backport v25.3.x

vbotbuildovich · 2026-02-04T09:41:06Z

Failed to create a backport PR to v25.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-29486-v25.3.x-725 remotes/upstream/v25.3.x
git cherry-pick -x 386a4885f4 e6fc8df2af 00b2afb3e3

Workflow run logs.

Copilot AI review requested due to automatic review settings February 2, 2026 15:00

github-actions bot added area/build area/redpanda labels Feb 2, 2026

Copilot AI reviewed Feb 2, 2026

View reviewed changes

bashtanov requested review from WillemKauf, bharathv, joe-redpanda and mmaslankaprv February 2, 2026 17:49

joe-redpanda reviewed Feb 2, 2026

View reviewed changes

joe-redpanda previously approved these changes Feb 2, 2026

View reviewed changes

bashtanov dismissed joe-redpanda’s stale review via a88d11a February 2, 2026 22:02

bashtanov force-pushed the coco-improvements branch from 70c47a3 to a88d11a Compare February 2, 2026 22:02

bashtanov requested review from Copilot and joe-redpanda February 2, 2026 22:02

Copilot AI reviewed Feb 2, 2026

View reviewed changes

bharathv reviewed Feb 2, 2026

View reviewed changes

src/v/raft/compaction_coordinator.h Show resolved Hide resolved

bharathv reviewed Feb 2, 2026

View reviewed changes

src/v/raft/compaction_coordinator.cc Show resolved Hide resolved

bashtanov force-pushed the coco-improvements branch from a88d11a to 9367a09 Compare February 3, 2026 11:12

bashtanov requested review from bharathv and Copilot February 3, 2026 11:19

Copilot AI reviewed Feb 3, 2026

View reviewed changes

joe-redpanda reviewed Feb 3, 2026

View reviewed changes

src/v/raft/tests/coordinated_compaction_test.cc Show resolved Hide resolved

joe-redpanda previously approved these changes Feb 3, 2026

View reviewed changes

bharathv previously approved these changes Feb 3, 2026

View reviewed changes

src/v/raft/tests: mark coordinated_compaction_test as not flaky

00b2afb

bashtanov dismissed stale reviews from bharathv and joe-redpanda via 00b2afb February 3, 2026 20:00

bashtanov force-pushed the coco-improvements branch from 9367a09 to 00b2afb Compare February 3, 2026 20:00

bashtanov requested review from bharathv and joe-redpanda February 3, 2026 20:01

joe-redpanda approved these changes Feb 3, 2026

View reviewed changes

bashtanov merged commit 774fa2c into redpanda-data:dev Feb 4, 2026
19 checks passed

vbotbuildovich mentioned this pull request Feb 4, 2026

[v25.3.x] coordinated compaction improvements #29516

Open

bashtanov mentioned this pull request Feb 4, 2026

[v25.3.x] coordinated compaction improvements #29520

Open

	// b) on a further call to `on_group_offsets_update` due to
	// b) on a further call to `update_group_offsets` due to

		"compaction coordinator planning to distribute group offsets in {}",
		group_offsets_send_delay);

Conversation

bashtanov commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbotbuildovich commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joe-redpanda left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vbotbuildovich commented Feb 4, 2026

Uh oh!

vbotbuildovich commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bashtanov commented Feb 2, 2026 •

edited

Loading

vbotbuildovich commented Feb 2, 2026 •

edited

Loading