-
Notifications
You must be signed in to change notification settings - Fork 7
Block structured Bloom filter #690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c59568d
to
23971a2
Compare
e232688
to
1ff45af
Compare
A test failure on 1ff45af: cabal run bloomfilter-tests -- -p prop_calc_size_fpr_bits --quickcheck-replay="(SMGen 18024522305972736904 8705882024453529731,95)"
Data.BloomFilter
Classic
calculations
prop_calc_size_fpr_bits: FAIL
*** Failed! Falsified (after 1 test and 11 shrinks):
BitsPerEntry 2.336869198112799
NumEntries 1000
0.3306894755756382 /= 0.3307128865771576 and not within (abs) tolerance of 1.0e-6
Use --quickcheck-replay="(SMGen 18024522305972736904 8705882024453529731,95)" to reproduce.
1 out of 1 tests failed (0.00s) One solution would be increase tolerance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't finish my review yet, but I'm posting these PR comments here just so that I don't lose them between now and when I continue the review. The comments are also not polished, they're mostly draft notes, so no need to look at or resolve them @dcoutts . I'll curate the comments as part of my next leg of reviewing
EDIT: I've deleted most comments in favour of re-adding them in the next review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very good! I have a bunch of comments / requests for clarifications, but nothing that should hold back the PR I think
There are also a number of comments that I left only on only one of the bloom filter types, but some of those comments probably also apply to the other type of bloom filter
It'd need 3e-5 here for this one. The calculations get particularly approximate around 2 bits or less, so another approach would be to adjust the tolerance so it's greater at the low end only. |
9714096
to
9ba3bee
Compare
For testing the bloomfilter lib in isolation, rather than the use in the lsm-tree lib.
We don't need multiple schemes.
rather than Data.Array.Byte. It seems to be compatible with a wider range of ghc & lib versions this way.
merge Data.BloomFilter.Mutable.Internal into Data.BloomFilter.Mutable
and export a helpful construction function.
The spell example was a test suite but does not really test anything. It really is an example, not a test. Remove the Words example since it was being used as a benchmark, but we now have better benchmarks.
Calculate the optimal number of bits and hashes directly rather than via an optimisation algorithm.
Remove the prime-based approach.
It was used to calculate the table of primes that we no longer use.
Co-authored-by: Joris Dral <[email protected]>
Co-authored-by: Joris Dral <[email protected]>
Co-authored-by: Joris Dral <[email protected]>
Co-authored-by: Joris Dral <[email protected]>
Co-authored-by: Joris Dral <[email protected]>
Co-authored-by: Joris Dral <[email protected]>
use a newtype for NumBlocks
They're the only ones with class connstraints and callbacks. Since they're trivial we just use inline rather than SPECIALISE.
misc minor review fixes
For the classic impl, test up to 75 bits per entry and similar FPRs. There's no artificial limit on bits per entry or FPRs. There's just a limit on the overall filter size.
To make clear where the formulae come from.
To avoid accidental breakage. To ensure we bump the formatVersion if we change the format.
9ba3bee
to
1071dd1
Compare
Long patch series to:
The result is better performance.