ct: Remove disk cache from L0 data layer by Lazin · Pull Request #29464 · redpanda-data/redpanda

Lazin · 2026-01-29T19:29:39Z

Subj.

The PR removes cloud storage cache from the read path. Instead, it uses record batch cache to store hydrated L0 objects.

Backports Required

Release Notes

none

Copilot

Pull request overview

This PR removes the cloud storage disk cache from the L0 data layer read path and replaces it with a memory-based record batch cache for storing hydrated L0 objects.

Changes:

Replaced disk-based cache operations with memory-based hydrated cache using the batch_cache infrastructure
Added per-partition hydrated cache tracking with epoch-based organization
Modified L0 object format to include a footer mapping partition data locations
Updated all L0 read/write request handling to include topic_id and topic_id_partition parameters

Reviewed changes

Copilot reviewed 67 out of 67 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/v/cloud_topics/batch_cache/hydrated_cache_api.h`	New API interface for per-partition hydrated L0 object caching
`src/v/cloud_topics/batch_cache/hydrated_object_index.h`	New index structure mapping L0 extents to synthetic offsets per epoch
`src/v/cloud_topics/batch_cache/batch_cache.cc`	Implementation of hydrated cache operations using batch_cache
`src/v/cloud_topics/level_zero/common/object.h`	New L0 object footer structure for partition data locations
`src/v/cloud_topics/level_zero/batcher/aggregator.cc`	Updated to write partition data with footer to L0 objects
`src/v/cloud_topics/level_zero/reader/materialized_extent_reader.cc`	Refactored to download full L0 objects and cache all partitions
`src/v/cloud_topics/level_zero/pipeline/write_request.h`	Added topic_id field to write requests
`src/v/cloud_topics/level_zero/pipeline/read_request.h`	Added topic_id_partition field to read requests
`src/v/cloud_topics/data_plane_impl.cc`	Removed cloud_io::cache dependency
Multiple test files	Updated to pass topic_id/topic_id_partition parameters

vbotbuildovich · 2026-01-29T20:52:37Z

CI test results

test results on build#79862

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ConsumerGroupBalancingTest	test_coordinator_nodes_balance	null	integration	https://buildkite.com/redpanda/redpanda/builds/79862#019c0b50-573d-4634-9cea-0f850c086b9b	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0095, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerGroupBalancingTest&test_method=test_coordinator_nodes_balance
DatalakeDLQTest	test_dlq_table_for_mixed_records	{"catalog_type": "rest_jdbc", "cloud_storage_type": 1, "query_engine": "spark"}	integration	https://buildkite.com/redpanda/redpanda/builds/79862#019c0b4f-9b98-4f28-a8d9-b94b0792bacd	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0051, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDLQTest&test_method=test_dlq_table_for_mixed_records
ScalingUpTest	test_fast_node_addition	null	integration	https://buildkite.com/redpanda/redpanda/builds/79862#019c0b50-5745-4a4f-9f82-785951984ed6	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0190, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition

test results on build#79907

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
QuotaManagementUpgradeTest	test_upgrade	null	integration	https://buildkite.com/redpanda/redpanda/builds/79907#019c1086-fdea-44e8-96d4-af487b19d371	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0278, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=QuotaManagementUpgradeTest&test_method=test_upgrade

Lazin · 2026-01-30T13:58:42Z

src/v/cloud_topics/level_zero/batcher/aggregator.cc


 template<class Clock>
-iobuf aggregator<Clock>::get_stream() {
+ss::future<iobuf> aggregator<Clock>::get_stream(l0::footer& footer_out) {


This can return iobuf. Doesn't have to be async.

Lazin · 2026-01-30T14:00:02Z

src/v/cloud_topics/level_zero/batcher/aggregator.cc

+
+    // Serialize the footer and append it to the payload
+    iobuf footer_buf;
+    co_await serde::write_async(footer_buf, footer_out.copy());


No need to use write_async. It is possible to serialize using serde::to_iobuf.

Lazin · 2026-01-30T14:02:09Z

src/v/cloud_topics/level_zero/batcher/aggregator.cc


 template<class Clock>
-aggregator<Clock>::L0_object aggregator<Clock>::prepare(object_id id) {
+ss::future<typename aggregator<Clock>::L0_object>


This should return an object of type L0_object instead of the future.

Lazin · 2026-01-30T14:03:14Z

src/v/cloud_topics/level_zero/batcher/aggregator.h

-    iobuf get_stream();
+    /// Returns the payload with footer appended and populates the footer
+    /// struct.
+    ss::future<iobuf> get_stream(l0::footer& footer_out);


should return iobuf
fundamentally, there is nothing in the implementation that makes this function async. all the data is available in memory

Lazin · 2026-01-30T14:05:51Z

src/v/cloud_topics/level_zero/common/object.h

+    // result = co_await l0::footer::read(std::move(missing));
+    // return std::get<l0::footer>(result);
+    // ```
+    static ss::future<std::variant<footer, size_t>> read(iobuf);


shouldn't be async
the footer should be possible to serialize/deserialize using from_iobuf/to_iobuf
the size of the footer is limited (cardinality is capped at 1000 by default) so it's impossible to hit reactor stalls

Lazin · 2026-01-30T14:32:08Z

src/v/cloud_topics/batch_cache/hydrated_cache_api.h

+    virtual ~partition_hydrated_cache_api() = default;
+
+    /// Check if hydrated data for this extent is cached for the partition.
+    virtual bool has(


rename to is_cached

Lazin · 2026-01-30T17:21:04Z

src/v/cloud_topics/batch_cache/hydrated_object_index.cc

+    return &new_it->second;
+}
+
+std::optional<model::offset> partition_hydrated_index::put_extent(


the batch_cache_probe should be updated here as well as in the batch_cache.cc

Lazin · 2026-01-30T17:29:03Z

src/v/cloud_topics/batch_cache/hydrated_object_index.h

+      const object_id& id, first_byte_offset_t byte_offset) const;
+
+    /// Check if this specific extent is cached.
+    bool has_extent(const object_id& id, first_byte_offset_t byte_offset) const;


The partition_hydrated_index should work slightly differently.
The read path can add the full extent to the cache. But then the read path will ask for the extent with the same id but with byte_offset which is greater or equal to the one that was used to add the entry.

Example:

put: id=Foo, byte_offset=0, size_bytes=1024

has_extent: id=Foo, bytes_offset=100, size_bytes=200 -> returns true

get: id=Foo, bytes_offset=100, size_bytes=200 -> returns the actual byte range (full offset is 0 + 100, size = 200)

Lazin · 2026-01-30T17:31:10Z

src/v/cloud_topics/batch_cache/hydrated_object_index.h

+    /// Get the batch_cache_index for a specific epoch.
+    /// Returns nullptr if the epoch doesn't exist.
+    storage::batch_cache_index*
+    get_batch_cache_index(cluster_epoch epoch) const;


This shouldn't be exposed. There should be get_extent method that wraps together get_synthetic_offset and get_batch_cache_index. It should take into account that the caller may query for the subset of the extent (the full extent is [1000, 2000] the query asks [1200, 1400]).

Signed-off-by: Evgeny Lazin <[email protected]>

The index is a translation layer used to store hydrated but not materialized batches in the record batch cache. Signed-off-by: Evgeny Lazin <[email protected]>

Signed-off-by: Evgeny Lazin <[email protected]>

Add hydrated L0 object caching to the batch_cache Signed-off-by: Evgeny Lazin <[email protected]>

Signed-off-by: Evgeny Lazin <[email protected]>

Use hydrated object cache in the read path. When the object is downloaded by the read path its footer is analyzed and pyaloads that belong to different partitions are dissiminated to the corresponding caches. The dependency on the cloud storage cache is removed. Signed-off-by: Evgeny Lazin <[email protected]>

Signed-off-by: Evgeny Lazin <[email protected]>

The cloud storage cache is no longer used anywhere in the L0. Signed-off-by: Evgeny Lazin <[email protected]>

Copilot AI review requested due to automatic review settings January 29, 2026 19:29

github-actions bot added area/build area/redpanda labels Jan 29, 2026

Copilot AI reviewed Jan 29, 2026

View reviewed changes

Lazin commented Jan 30, 2026

View reviewed changes

Lazin added 12 commits January 30, 2026 14:30

ct: Add footer to the L0 object

0bbff24

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Use topic_partition_id in L0

fe004f6

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Add hydrated_object_index

b77a4ca

The index is a translation layer used to store hydrated but not materialized batches in the record batch cache. Signed-off-by: Evgeny Lazin <[email protected]>

ct: Use tpid in the read pipeline

cc574f3

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Log tpid in the read_fanout

461335a

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Log tidp in read_debounce

45b41fa

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Update batch_cache

43e9d2f

Add hydrated L0 object caching to the batch_cache Signed-off-by: Evgeny Lazin <[email protected]>

ct: Log tpid in the read_request_scheduler

c34f86d

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Log tidp in the frontend

5079c6e

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Log tidp in the frontend_reader

00a8e55

Signed-off-by: Evgeny Lazin <[email protected]>

ct: Remove dependency on cloud storage cache

3f03104

The cloud storage cache is no longer used anywhere in the L0. Signed-off-by: Evgeny Lazin <[email protected]>

Lazin force-pushed the ct/cache-improvement branch from 9a7d457 to 3f03104 Compare January 30, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ct: Remove disk cache from L0 data layer#29464

ct: Remove disk cache from L0 data layer#29464
Lazin wants to merge 12 commits intoredpanda-data:devfrom
Lazin:ct/cache-improvement

Lazin commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

vbotbuildovich commented Jan 29, 2026 •

edited

Loading

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Lazin Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Lazin commented Jan 29, 2026

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

vbotbuildovich commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vbotbuildovich commented Jan 29, 2026 •

edited

Loading