-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: Implement chunking for large SBF filters #5944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit adds chunking functionalities for load/save operations of bloom filters. Additional information is added in the serialization of each filter. Specifically, when saving each filter the total size of the filter is written followed by chunks of the filter (max size of 64 MB per chunk). Signed-off-by: Eric <[email protected]>
40e6b5b to
0fbd8dd
Compare
|
I've done some manual testing to determine at what capacity and error rate would induce chunking on filters. However, when I implement the same sequences of commands specifically What would be a good way of testing the functionality in the unit tests? |
|
I see the tests pass, can you please push the test that fails so I could advice? |
|
Got the test working. |
|
Hi @EricHayter , thanks for doing this. The problem with this PR is that it breaks compatibility with the previous versions. A possible way to address is to do the following:
This way, in production we will still use the old encoding for 5-6 months and we will have enough released versions that support the new format. then we can just flip the flag in the future and slowly retire the old format. yes, wire formats are complicated to change. |
61939aa to
5fef132
Compare
Added a new flag `rdb_sbf_chunked` which determines the save format of SBFs. Also, separate functions for saving SBFs were added. Signed-off-by: Eric <[email protected]>
5fef132 to
e83b84f
Compare
| SET_OR_UNEXPECT(LoadLen(nullptr), hash_cnt); | ||
| SET_OR_UNEXPECT(FetchGenericString(), filter_data); | ||
|
|
||
| if (absl::GetFlag(FLAGS_rdb_sbf_chunked)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you need to use FLAGS_rdb_sbf_chunked in the loader.
The load part should only rely on RDB_TYPE_SBF2 and RDB_TYPE_SBF to decide how to parse. it should be stateless in this sense.
This commit adds chunking functionalities for load/save operations of bloom filters. Additional information is added in the serialization of each filter. Specifically, when saving each filter the total size of the filter is written followed by chunks of the filter (max size of 64 MB per chunk).
Addresses #5314.