-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chg: [join] fix overflowing filters on join operation #18
base: master
Are you sure you want to change the base?
Conversation
@satta Do you think it would be possible to review and merge it? We are using the library for various projects and we would like to keep getting the library upstream. Thanks a lot. |
Sorry, I just didn't notice the PR before, or I would have responded earlier. I deliberately did not check the sum of the filter element counts against the target capacity because any target count calculation only correct when both filters are disjoint. If there is a large overlap between the filters, then the target number of element counts can in fact be lower than the sum of both input element counts. Hence joining such filters needs not necessarily exceed the capacity. See also the corresponding comment in the code (https://github.com/DCSO/bloom/blob/master/bloom.go#L237) stating that the new count is only to be considered an upper bound. So long story short, I would rather not disallow such joins in the library code altogether. I'd rather handle such cases in the code that uses the library, where more is known about the usage pattern. |
I understand the rationale. But I a little worried that joining some filters--mostly disjoints--may exceed their maximum capacity and become incorrect regarding their false positive rate. I am not sure I can still trust |
An alternative would be to forbid joining filters that exceed the capacity, and only allow it when a force flag |
This would be my preferred solution, but would obviously only affect the command line tool instead of the library. If that's fine with you as well, could you adjust your PR accordingly please? :) |
sure, will do when I have some free cycles ;) |
Thanks! |
bloom
does not check correctly the receiving filter's capacity before merging another one into it. For instance:result:
This PR aims at correcting this small quirck:
cheers,
jlouis