Skip to content

Conversation

@Lazin
Copy link
Contributor

@Lazin Lazin commented Feb 5, 2026

This PR enables cloud topics cluster recovery to bootstrap partitions with the correct start offset and term from the L1 metastore. When a cloud topics cluster is recovered, partitions now start at their known offsets rather than offset 0, ensuring data consistency without requiring cloud access during partition creation.

Changes

Core Infrastructure
New Controller Command: set_partition_bootstrap_params_cmd

Adds a new controller command to set bootstrap parameters (start_offset, initial_term) for partitions before topic creation
Parameters are stored in topic_table._pending_bootstrap_params map keyed by NTP
topics_frontend::set_bootstrap_params() provides the API for setting these parameters

Partition Bootstrap Flow

controller_backend fetches bootstrap params from topic_table when creating partitions
partition_manager::manage() uses bootstrap params to initialize partition state via bootstrap_partition_state()
New raft::bootstrap_partition_state() function creates initial raft state with known offset/term

Cluster Recovery Integration

Modified cluster_recovery_backend.cc to query L1 metastore for each cloud topic partition's start_offset and term
Calls set_bootstrap_params() before creating recovered cloud topics
Partitions are created with correct offsets from metastore, avoiding cloud access during partition creation

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

Copilot AI review requested due to automatic review settings February 5, 2026 23:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements L0 recovery functionality by adding the ability to bootstrap partitions with custom initial offsets and terms. This enables programmatic partition creation for cluster recovery scenarios where partitions need to start at specific known offsets.

Changes:

  • Added bootstrap_partition_state function to create partition state with custom offset/term
  • Introduced partition_bootstrap_params to store and propagate bootstrap parameters through the system
  • Added commands and handlers to set bootstrap parameters on existing topics before partition materialization

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/v/raft/consensus_utils.h Declares new bootstrap_partition_state function
src/v/raft/consensus_utils.cc Implements bootstrap_partition_state to create Raft snapshot and set storage metadata
src/v/raft/tests/bootstrap_partition_state_test.cc Tests bootstrap_partition_state functionality
src/v/raft/tests/BUILD Adds build configuration for bootstrap test
src/v/cluster/types.h Defines partition_bootstrap_params and related command data structures
src/v/cluster/types.cc Implements output operator for partition_bootstrap_params
src/v/cluster/topics_frontend.h Declares set_bootstrap_params frontend method
src/v/cluster/topics_frontend.cc Implements set_bootstrap_params to replicate bootstrap command
src/v/cluster/topic_updates_dispatcher.h Declares apply method for bootstrap params command
src/v/cluster/topic_updates_dispatcher.cc Implements command dispatch for bootstrap params
src/v/cluster/topic_table.h Adds bootstrap_params field to partition metadata
src/v/cluster/topic_table.cc Implements storage and retrieval of bootstrap params
src/v/cluster/tests/topic_table_test.cc Tests bootstrap params command application
src/v/cluster/partition_manager.h Adds bootstrap_params parameter to manage method
src/v/cluster/partition_manager.cc Implements bootstrap logic using bootstrap_partition_state
src/v/cluster/controller_snapshot.h Updates snapshot schema to include bootstrap_params
src/v/cluster/controller_backend.h Adds bootstrap_params parameter to create_partition
src/v/cluster/controller_backend.cc Passes bootstrap params through partition creation pipeline
src/v/cluster/commands.h Defines set_partition_bootstrap_params_cmd

@Lazin Lazin force-pushed the ct/ctp-recovery branch 2 times, most recently from 2c63bce to b85c5ce Compare February 6, 2026 20:07
@redpanda-data redpanda-data deleted a comment from Copilot AI Feb 6, 2026
Add infrastructure to support bootstrapping partitions with custom
initial offset and term. This is used for programmatic partition
creation during cluster recovery.

- Add partition_bootstrap_params struct in types.h
- Add get_partition_bootstrap_params() API to topic_table
- Pass bootstrap_params through controller_backend to partition_manager
- Add bootstrap_existing_log() helpers in raft/consensus_utils

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
Lazin added 4 commits February 6, 2026 16:28
Add a controller command to set bootstrap parameters for partitions
in an existing topic. This enables cluster recovery to:
1. Create topics without remote_topic_properties
2. Set bootstrap params via this command
3. Let controller_backend create partitions with known offsets

- Add set_partition_bootstrap_params_cmd_data struct
- Add set_partition_bootstrap_params_cmd command type
- Add apply() method in topic_table and topic_updates_dispatcher
- Add set_bootstrap_params() method in topics_frontend
- Add unit tests

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
paramters propagation and use. The test registers partition bootstrap
parameters (start offset and term id) and then creates the topic. The
partition for which the bootstrap parameters were registered is then
validated to have the right starting offset.

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#80344

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/quota_management_test.py::QuotaManagementUpgradeTest.test_upgrade

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 6, 2026

CI test results

test results on build#80344
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
invalid_describe_configs_test bad_describe_config_response unit https://buildkite.com/redpanda/redpanda/builds/80344#019c34db-bcef-4c46-b1cf-c07b47eca920 FAIL 0/1
QuotaManagementUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/80344#019c34ef-3174-45ef-b9bd-74f11cb38d80 FLAKY 9/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=QuotaManagementUpgradeTest&test_method=test_upgrade
test results on build#80380
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
NodesDecommissioningTest test_decommission_status null integration https://buildkite.com/redpanda/redpanda/builds/80380#019c39cd-077a-4078-b4e2-efc976366484 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0491, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1402, p1=0.2208, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommission_status
test results on build#80435
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
DataMigrationsApiTest test_higher_level_migration_api null integration https://buildkite.com/redpanda/redpanda/builds/80435#019c4616-42e4-449a-98a6-d3d0c4fcad3e FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DataMigrationsApiTest&test_method=test_higher_level_migration_api
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/80435#019c4619-ef05-46a8-9e34-f7c9a4701ce0 FLAKY 5/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.1085, p0=0.0024, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#80448
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ScalingUpTest test_moves_with_local_retention {"use_topic_property": true} integration https://buildkite.com/redpanda/redpanda/builds/80448#019c489a-3c3f-4e14-b91a-547c0091d87a FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0229, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_moves_with_local_retention

@Lazin Lazin changed the title ct: L0 recovery (WIP) ct: Cluster recovery Feb 6, 2026
@Lazin Lazin changed the title ct: Cluster recovery ct: E2e cluster recovery Feb 7, 2026
@Lazin Lazin changed the title ct: E2e cluster recovery ct: End to end cluster recovery Feb 7, 2026
@Lazin Lazin requested review from andrwng and dotnwat February 7, 2026 00:09
add_random_topic(); // invalidates iterator
BOOST_REQUIRE_THROW((void)it->first, iterator_stability_violation);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Create topics without remote_topic_properties

I get that we can avoid downloading them on autocreate, but I think we still want to preserve the initial revision id of the topic. Topic manifests (used for read replicas) paths will include the initial revisions, and if we aren't preserving it, every time we do recovery we'll start uploading manifests to a different path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why do we need them in L0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't just for L0, it's for recovery of the entire topic. L1 state is discoverable (e.g. through read replicas) by topic manifests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. L0 doesn't need it to function but polluting bucket with manifests is a no go. Do we store initial revision in the metastore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to avoid pulling in all topic manifests if possible. Given that we already have the metastore it should be a main way to pull metadata. Using a mix of data from the metastore and the manifests feels a bit odd. The metastore is clearly superior for the recovery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think we need to download the manifests. But the caller of recovery (wcr) should have enough information from the controller to give the revision id (if a remote revision is set, use it, and if not, use the topic revision)

Comment on lines +109 to +110
model::topic_namespace,
absl::btree_map<model::partition_id, partition_bootstrap_params>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we guarantee that the topic creation that consumes these is actually the topic that we care about? Should we be including a revision id somewhere in the key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the revision id of the topic because the topic is created after the bootstrap params are set. This is only used during the full cluster recovery. I guess we want to clean this up after partitions are created to avoid a situation when the topic is re-created but after that it's picking up the old bootstrap params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the cleanup for that stuff. So now the cluster recovery backend is setting the bootstrap parameters and then when the recovery is completed and all partitions are reconciled and actually created on replicas bootstrap parameters are removed.

Comment on lines +804 to +811
// Params should still be available after topic creation
params0 = table.local().get_partition_bootstrap_params(ntp0);
BOOST_REQUIRE(params0.has_value());
BOOST_REQUIRE_EQUAL(params0->start_offset, model::offset(1000));

params1 = table.local().get_partition_bootstrap_params(ntp1);
BOOST_REQUIRE(params1.has_value());
BOOST_REQUIRE_EQUAL(params1->start_offset, model::offset(2000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an important property to maintain? I would have thought that once we create the topic, we might want the topic table to remove the bootstrap params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but at the moment there is no command that removes them
but the intention is to eventually remove them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the command that removes the bootstrap parameters, there is a test for that below.

Comment on lines +478 to +480
/// Data for setting bootstrap parameters on existing topic partitions.
/// Used by cluster recovery to specify known offsets for partitions.
struct set_partition_bootstrap_params_cmd_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're leaving the bootstrap params in the topics table anyway, what's the rationale for having another command for this, vs including it as an optional parameter for topic creation?

Having it be separate feels a bit odd, because e.g. cluster recovery could fail midway and we could be left with the boostrap params in the topic table (and maybe they'd be unintentionally used by some other topic creation?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a separation of concerns. The plan is to add a clean-up command that will remove bootstrap state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about the case where WCR fails and we end up leaving state in the topics table?

WDYT about storing the bootstrap params in the cluster_recovery_state (also replicated on every node on every shard through the cluster_recovery_table). That way it makes it clear that these bootstrap params are only for recovery, and it becomes easy to clear state atomically with respect to WCR status (e.g. once we complete or fail WCR, the state transition could clear this map).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that should be fine. If recovery fails the user will retry the recovery. The same set of bootstrap params will be used.
I can try to do this, but I'm a bit concerned about new dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user doesn't retry recovery though, this seems like a surprise waiting to happen. It doesn't seem like an unreasonable sequence of events that a user tries to recover, it fails midway, but the user thinks inspects the cluster and thinks it's good enough to continue their jobs, and doesn't retry.

Re: dependencies, yea it's a fair concern. I'm hoping there aren't any surprises there. I do appreciate that the cluster recovery table state updates are deterministic, but it's worth thinking about if there are races between its updates and topic creation...

Copy link
Contributor Author

@Lazin Lazin Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that there are races because we're waiting until the changes are applied to the topic table and the reconciliation loop is doing the same thing. Essentially, both commands are replicated with the same log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest solution is to clear this state on recovery failure. I'll try this next.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when the new recovery starts.

// hasn't been constructed yet.
// TODO: implement a recovery primitive for cloud topics
if (!ntp_cfg.cloud_topic_enabled()) {
if (bootstrap_params.has_value()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also condition this on having cloud_topics_enabled set? It feels risky to be permissive here, unless you have some other user of bootstrap params in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanism is generic, it's not tied to the cloud topics and will work in any case.

assert result[0] is not None
return result[0]

def _wait_for_metastore_start_offset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

Lazin added 3 commits February 7, 2026 05:37
Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
to the topics_frontend. The method replciates the command that clears
pending bootstrap state from the topic_table.

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
how the clear_partition_bootstrap_params_cmd command is handled

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
@Lazin Lazin requested a review from andrwng February 9, 2026 16:42
Lazin added 2 commits February 9, 2026 16:54
When all topics/partitions are created and reconciled invoke the method
to remove bootstrap state.

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
When the cloud topic is recovered the revision id has to be populated.
This is done in the cluster recovery reconciler.

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#80435

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/write_caching_fi_e2e_test.py::WriteCachingFailureInjectionE2ETest.test_crash_all@{"use_transactions":false}

Add two new commands to the offline_log_viewer tool.

Signed-off-by: Evgeny Lazin <4lazin@gmail.com>
@Lazin
Copy link
Contributor Author

Lazin commented Feb 10, 2026

upd:

  • the cleanup is now called if recovery fails or the new recovery starts
  • the offline_log_viewer is updated to support two new controller commands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants