An Optimal Bloom Filter Replacement
Anna Pagh, Rasmus Pagh, and S. Srinivasa Rao
In Proc. ACM-SIAM Symposium on Discrete Algorithms, SODA 2005.
Abstract
This paper considers space-efficient data structures for storing an
approximation S' to a set S such that S subseteq S' and any
element not in S belongs to S' with probability at most
epsilon. The Bloom filter data structure, solving this
problem, has found widespread use. Our main result is a new RAM data
structure that improves Bloom filters in several ways:
- The time for looking up an element in S' is O(1),
independent of epsilon.
- The space usage is within a lower order term of the lower bound.
- The data structure uses explicit hash function families.
- The data structure supports insertions and deletions on
S in amortized expected constant time.
The main technical ingredient is a succinct representation of
dynamic multisets. We also consider three recent generalizations of
Bloom filters.