Use a concurrent buffer deque in FragmentConsolidation. by bekadavis9 · Pull Request #5700 · TileDB-Inc/TileDB

bekadavis9 · 2025-11-24T18:41:37Z

The current implementation of FragmentConsolidation always sets up a very large buffer space (~ 10GB) and performs consolidation within that workspace. This work aims to reduce the memory footprint and latency of fragment consolidation through use of a ProducerConsumerQueue for concurrent reads/writes. At present, the buffers are (somewhat arbitrarily) sized at 10MB, and the queue is capped at size 10. As such, there are quantitatively more allocations, but the overall size (and runtime) is drastically reduced, as small operations need not construct / destruct the full workspace.

The following test saw an improvement in runtime of 16.500s -> 0.130s:
./test/tiledb_unit --durations yes --vfs=native "C++ API: Test consolidation that respects the current domain"

TYPE: IMPROVEMENT
DESC: Use a concurrent buffer deque in FragmentConsolidation.

Resolves CORE-411.

tiledb/sm/consolidator/fragment_consolidator.cc

test/src/unit-capi-consolidation.cc

tiledb/sm/consolidator/fragment_consolidator.cc

test/src/unit-capi-consolidation.cc

tiledb/sm/consolidator/fragment_consolidator.cc

rroelke

You need to add new tests for some of the edge cases of the new implementation before we can merge this.

test/src/unit-capi-consolidation.cc

tiledb/sm/consolidator/fragment_consolidator.cc

test/src/unit-capi-consolidation.cc

tiledb/sm/consolidator/fragment_consolidator.cc

rroelke

Still more to do. Aside from the comments -

Performance

As mentioned in the sync we do need to observe that this is not worse for representative customer data. You've clearly seen that it is much faster for the unit tests, which is excellent; but that may not be reflected in production.

I would like to see some kind of repeatable program we can run so that we can re-use this benchmark later on, to compare the result of this to the new consolidation we will implement later.

Add a way to configure the initial buffer size so that we can play with it a bit to see how it affects performance.

Initial buffer size

We had discussed prototyping with a fixed number like 10M. I didn't catch that in this review. But this probably is not what we want to use anyway. The consolidator knows the average var cell size, and can compute the fixed part size of the cells. This should be used to inform the initial buffer size in some fashion. For example, if 10M only fits 1 average cell, then it is probably not a good choice for buffer size.

Testing

Testing is mostly using the existing tests, and then we'll see the performance testing from above. The new tests intend to force the reader to wait for memory to become available. We need tests (new or existing) which in some way assert that this is actually happening. See the review comments about waiting - I have some doubt about its correctness. If we merge #5725 that will help, but that's not the only thing you can do.

tiledb/sm/consolidator/fragment_consolidator.cc

test/src/unit-capi-consolidation.cc

rroelke · 2026-01-23T12:56:29Z

tiledb/sm/consolidator/fragment_consolidator.cc

+    // Allow use of deprecated param `config_.buffer_size_`
+    // Allow the buffer to grow 3 times
+    uint64_t initial_buffer_size = config_.buffer_size_ != 0 ?
+                                       buffer_budget :


I spent some time analyzing this and it's hard to know if it is correct. First you have to recognize that FragmentConsolidationWorkspace also uses this config parameter to override its constructor argument; then you have to follow through whether the memory accounting uses the expected or actual buffer size; and so on. This deprecated parameter makes things messy. Plus, the line immediately following this might mess it up.

Even if it is deprecated we do have to acknowledge that someone out there might be depending on it, so my current line of thinking is that copy_array should probably just do the old thing whenever this parameter is nonzero.

That's the intent here. Use the old behavior if buffer_size_ is set (non-zero) and the new behavior otherwise.
On main, the initial consolidation workspace is set to size buffer_budget by the fragment consolidator (ref). The workspace itself still checks the config_.buffer_size_ internally, as the behavior of resize_buffers has not changed, only migrated to the constructor.

I vote for removing the config parameters that were marked as deprecated here 2(!) years ago. Especially if this makes our code simpler, safer and more readable.

Please do it in a separate PR though and rebase this branch once merged.

rroelke · 2026-01-23T13:01:27Z

tiledb/sm/consolidator/fragment_consolidator.cc

+    uint64_t initial_buffer_size = config_.buffer_size_ != 0 ?
+                                       buffer_budget :
+                                       config_.initial_buffer_size_;
+    initial_buffer_size = std::min(initial_buffer_size, buffer_budget / 8);


It's probably a good idea to use the new "what level was this config set at?" facility that Agis added here. If initial_buffer_size is the configuration default, then the user did not set it, and we have some freedom to adjust it down, and/or determine a size using properties of the array schema as I had suggested. But if they did specify it in some way then we should use that (just taking the min with the budget). This is useful as an override for performance testing, for example.

…vent hangs.

kounelisagis · 2026-02-12T12:55:15Z

tiledb/sm/consolidator/fragment_consolidator.cc

+  // Deque which stores the buffers passed between the reader and writer.
+  // Total size of enqueued buffers may not exceed `max_queue_size`.
+  // The reader will enqueue until that limit, so adjust `buffer_size`
+  // via `Config::initial_buffer_size` to allow concurrrent in-flight buffers.


Suggested change

// via `Config::initial_buffer_size` to allow concurrrent in-flight buffers.

// via `Config::initial_buffer_size` to allow concurrent in-flight buffers.

kounelisagis · 2026-02-12T12:57:52Z

tiledb/sm/consolidator/fragment_consolidator.cc

+    Query* query_r,
+    Query* query_w,
+    const ArraySchema& reader_array_schema_latest,
+    std::unordered_map<std::string, uint64_t> average_var_cell_sizes,


Suggested change

std::unordered_map<std::string, uint64_t> average_var_cell_sizes,

const std::unordered_map<std::string, uint64_t>& average_var_cell_sizes,

I think we can avoid copying the entire map by value on every call.

bekadavis9 requested a review from rroelke November 24, 2025 18:41

bekadavis9 force-pushed the rd/core-411 branch from c3a3e3a to 358d1b3 Compare November 24, 2025 18:42

rroelke reviewed Nov 24, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Show resolved Hide resolved

rroelke reviewed Nov 24, 2025

View reviewed changes

test/src/unit-capi-consolidation.cc Show resolved Hide resolved

rroelke reviewed Nov 24, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Outdated Show resolved Hide resolved

bekadavis9 force-pushed the rd/core-411 branch 2 times, most recently from e70da9b to 47ea0fc Compare December 5, 2025 18:35

bekadavis9 requested a review from rroelke December 5, 2025 19:19

rroelke reviewed Dec 5, 2025

View reviewed changes

test/src/unit-capi-consolidation.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 5, 2025

View reviewed changes

test/src/unit-capi-consolidation.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 5, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 5, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 5, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 5, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Outdated Show resolved Hide resolved

rroelke requested changes Dec 5, 2025

View reviewed changes

bekadavis9 force-pushed the rd/core-411 branch 3 times, most recently from 77c355d to 2a28600 Compare December 10, 2025 18:15

rroelke reviewed Dec 11, 2025

View reviewed changes

test/src/unit-capi-consolidation.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 11, 2025

View reviewed changes

test/src/unit-capi-consolidation.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 11, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 11, 2025

View reviewed changes

test/src/unit-capi-consolidation.cc Outdated Show resolved Hide resolved

rroelke reviewed Dec 22, 2025

View reviewed changes

tiledb/sm/consolidator/fragment_consolidator.cc Outdated Show resolved Hide resolved

rroelke mentioned this pull request Jan 7, 2026

chore: add INTERCEPT capability to enable unit test code to run arbitrary code at predefined points in the library code #5725

Merged

bekadavis9 added 6 commits January 15, 2026 14:40

Use a concurrent buffer deque in FragmentConsolidation.

ce277b4

Minor changes.

3ab6232

Wait for read task to finish when writer fails, finalize in copy_array.

3efa3be

Add missing header.

7d41804

Add back test which ensures consolidation can progress.

d0968de

Address most comments.

6e3ff56

bekadavis9 added 4 commits January 15, 2026 14:41

Add additional test coverage.

04fecba

Minor test updates.

191a11d

Some test changes and debug WIP.

963d2d5

More updates

0e6611a

bekadavis9 force-pushed the rd/core-411 branch from d4c4f49 to 0e6611a Compare January 15, 2026 19:41

rroelke and others added 3 commits January 15, 2026 14:44

Fix TRY macro error check

51b108c

Use TRY for tiledb_array_consolidate so that Catch2 prints error

4cddfc2

Update test.

cff1000

rroelke requested changes Jan 20, 2026

View reviewed changes

bekadavis9 requested a review from a team as a code owner January 21, 2026 22:22

kounelisagis reviewed Jan 21, 2026

View reviewed changes

bekadavis9 force-pushed the rd/core-411 branch from 6fa8594 to 9419e0d Compare January 21, 2026 22:25

Address comments.

89ab72d

bekadavis9 force-pushed the rd/core-411 branch from 9419e0d to 89ab72d Compare January 21, 2026 22:28

Fix maybe-unintialized std::optional false positive in TRY macro

cd4a640

rroelke reviewed Jan 23, 2026

View reviewed changes

bekadavis9 added 2 commits January 25, 2026 16:43

Merge remote-tracking branch 'origin' into rd/core-411

6fc39a8

Add intercept test, some debug WIP.

2646824

rroelke mentioned this pull request Feb 3, 2026

Remove deprecated config param sm.consolidation.buffer_size #5743

Open

Add recycled buffer queue and other safeguards for speedup and to pre…

d759930

…vent hangs.

kounelisagis reviewed Feb 12, 2026

View reviewed changes

	// via `Config::initial_buffer_size` to allow concurrrent in-flight buffers.
	// via `Config::initial_buffer_size` to allow concurrent in-flight buffers.

	std::unordered_map<std::string, uint64_t> average_var_cell_sizes,
	const std::unordered_map<std::string, uint64_t>& average_var_cell_sizes,

Comments

Conversation

bekadavis9 commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rroelke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rroelke left a comment

Choose a reason for hiding this comment

Performance

Initial buffer size

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rroelke Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

bekadavis9 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

ypatia Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

rroelke Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

kounelisagis Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

kounelisagis Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants