Skip to content

Conversation

@WillemKauf
Copy link
Contributor

Hooks the compaction scheduling group into cloud topics, and adds functionality to compute the backlog of compacted logs in cloud topics along with hooking it into the compaction_controller PID to control the number of shares allocated to the compaction scheduling group.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

Copilot AI review requested due to automatic review settings February 2, 2026 20:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the compaction scheduling group into cloud topics to enable better resource management for compaction operations. The changes add functionality to compute the compaction backlog in bytes for cloud topics and hook it into the existing compaction_controller PID system.

Changes:

  • Extended the compaction_controller to accept a callback function for computing cloud topics compaction backlog
  • Added dirty_bytes field to compaction metadata structures to track the size of dirty data
  • Modified cloud topics compaction workers to use the compaction scheduling group for CPU resource allocation

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/v/storage/compaction_controller.h Added backlog_fn callback parameter to support cloud topics backlog computation
src/v/storage/compaction_controller.cc Implemented cloud topics backlog aggregation via cross-shard submission to shard 0
src/v/redpanda/application_services.cc Wired up cloud topics backlog callback when initializing compaction controller
src/v/cloud_topics/level_one/metastore/simple_metastore.h Introduced dirty_stats struct to return both dirty ratio and bytes
src/v/cloud_topics/level_one/metastore/simple_metastore.cc Refactored dirty ratio calculation to compute and return dirty bytes
src/v/cloud_topics/level_one/metastore/rpc_types.h Added dirty_bytes field to RPC types with version bump
src/v/cloud_topics/level_one/metastore/replicated_metastore.cc Updated to propagate dirty_bytes field from RPC responses
src/v/cloud_topics/level_one/metastore/metastore.h Added dirty_bytes to compaction_info_response structure
src/v/cloud_topics/level_one/domain/simple_domain_manager.cc Propagated dirty_bytes in domain manager compaction info responses
src/v/cloud_topics/level_one/domain/db_domain_manager.cc Propagated dirty_bytes in database domain manager
src/v/cloud_topics/level_one/compaction/worker.h Added scheduling group member to compaction worker
src/v/cloud_topics/level_one/compaction/worker.cc Modified worker loop to execute within the compaction scheduling group
src/v/cloud_topics/level_one/compaction/worker_manager.cc Updated to pass compaction scheduling group to workers
src/v/cloud_topics/level_one/compaction/scheduler.h Added compaction_backlog() method declaration
src/v/cloud_topics/level_one/compaction/scheduler.cc Implemented backlog computation summing dirty bytes from queued logs
src/v/cloud_topics/app.h Added public compaction_backlog() method to app interface
src/v/cloud_topics/app.cc Implemented app-level backlog accessor delegating to scheduler
Test files Updated test assertions to verify dirty_bytes computation and updated test fixtures with new parameters

Comment on lines +26 to +27
auto fn = _cloud_topics_backlog;
cloud_backlog = co_await ss::smp::submit_to(0, std::move(fn));
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local copy of _cloud_topics_backlog into fn is unnecessary. The std::move on line 27 will transfer ownership of the copy, not the member variable. Consider directly moving _cloud_topics_backlog into submit_to without the intermediate variable.

Copilot uses AI. Check for mistakes.
log_ptr->info_and_ts->info.dirty_bytes);
}
}
return total / static_cast<int64_t>(ss::smp::count);
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dividing by the number of shards may result in loss of precision and underreporting the backlog when total is not evenly divisible by shard count. Consider returning the total backlog without division, or document why per-shard reporting is necessary and how rounding affects the PID controller behavior.

Suggested change
return total / static_cast<int64_t>(ss::smp::count);
auto shards = static_cast<int64_t>(ss::smp::count);
if (shards <= 1) {
return total;
}
// Use ceiling division to avoid systematic under-reporting of backlog
return (total + shards - 1) / shards;

Copilot uses AI. Check for mistakes.
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#79972
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/79972#019c2021-f1dc-4a14-97bc-b46fea0e916e FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0917, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.2506, p1=0.0559, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants