Skip to content

Conversation

@EricHayter
Copy link
Contributor

This commit adds chunking functionalities for load/save operations of bloom filters. Additional information is added in the serialization of each filter. Specifically, when saving each filter the total size of the filter is written followed by chunks of the filter (max size of 64 MB per chunk).

Addresses #5314.

This commit adds chunking functionalities for load/save operations of
bloom filters. Additional information is added in the serialization of
each filter. Specifically, when saving each filter the total size of the
filter is written followed by chunks of the filter (max size of 64 MB
per chunk).

Signed-off-by: Eric <[email protected]>
@EricHayter EricHayter force-pushed the large-sbf-serialization branch from 40e6b5b to 0fbd8dd Compare October 23, 2025 01:04
@EricHayter
Copy link
Contributor Author

I've done some manual testing to determine at what capacity and error rate would induce chunking on filters. However, when I implement the same sequences of commands specifically BF.RESERVE for that large of a filter (> 64 MB) the RUN call fails:
E20251022 20:57:24.968197 8972 rdb_load.cc:2152] Error while calling HandleAux(): Out of memory, or used memory is too high

What would be a good way of testing the functionality in the unit tests?

@romange
Copy link
Collaborator

romange commented Oct 23, 2025

I see the tests pass, can you please push the test that fails so I could advice?
I think explicit overriding of max_memory_limit limit might help...

@EricHayter
Copy link
Contributor Author

Got the test working. max_memory_limit fixed it. Thanks.

@EricHayter EricHayter marked this pull request as ready for review October 23, 2025 14:45
@romange
Copy link
Collaborator

romange commented Nov 4, 2025

Hi @EricHayter , thanks for doing this. The problem with this PR is that it breaks compatibility with the previous versions.
newer versions of dragonfly won't be able to load the "old" format during the replication or snapshotting.

A possible way to address is to do the following:

  1. To introduce RDB_TYPE_SBF2
  2. On the loading side to support both the previous format and the current one.
  3. On the saving side to add rdb_sbf_chunked flag that will default to false. In the unit test that you added set it to true. Above the flag add a comment: "to flip its value to 'true' in March 2026", so we would have enough versions released that can already load the new format (to prevent broken rollbacks, if say someone updates dragonfly but wants to rollback - they won't be able to if we just flip to a new format now).

This way, in production we will still use the old encoding for 5-6 months and we will have enough released versions that support the new format. then we can just flip the flag in the future and slowly retire the old format.

yes, wire formats are complicated to change.

@romange romange self-requested a review November 4, 2025 08:27
@EricHayter EricHayter force-pushed the large-sbf-serialization branch 2 times, most recently from 61939aa to 5fef132 Compare November 6, 2025 01:46
Added a new flag `rdb_sbf_chunked` which determines the save format of
SBFs. Also, separate functions for saving SBFs were added.

Signed-off-by: Eric <[email protected]>
@EricHayter EricHayter force-pushed the large-sbf-serialization branch from 5fef132 to e83b84f Compare November 6, 2025 01:55
SET_OR_UNEXPECT(LoadLen(nullptr), hash_cnt);
SET_OR_UNEXPECT(FetchGenericString(), filter_data);

if (absl::GetFlag(FLAGS_rdb_sbf_chunked)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you need to use FLAGS_rdb_sbf_chunked in the loader.
The load part should only rely on RDB_TYPE_SBF2 and RDB_TYPE_SBF to decide how to parse. it should be stateless in this sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants