MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
Updated
Jun 4, 2024 - Python
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
HyperLogLog and other probabilistic data structures for mining in data streams
Exploring Probabilistic Data Structures in Python - my 2021 Pycon USA and Australia and Pycon MEA 2022 talk.
UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting
A simple, time-tested, family of random hash functions in Python, based on CRC32 and xxHash, affine transformations, and the Mersenne Twister. 🎲
Experiments with RedisBloom and the text from Moby Dick
python implementations of the Flajolet-Martin, LogLog, SuperLogLog, and HyperLogLog cardinality estimation algorithms, specifically used to estimate the cardinality of unique traffic violations in NYC in the 2019 fiscal year
Implementation and experimental tests of various algorithms.
Yet Another Lame Algorithm Library
Distributed Cardinality Tracking
A simple Python implementation of the HyperLogLog algorithm, a probabilistic data structure used for estimating the cardinality of a set.
This repository represents several projects completed in IE HST's MS in Business Analytics and Big Data's Stream Processing Analytics course.
Approximate Privacy-Preserving Neighbourhood Estimations
Master's | Design & Analysis of Algorithms | Algorithms for Big Data Processing
A Python project demonstrating efficient estimation of unique elements in any dataset using the HyperLogLog algorithm with parallel processing. In this example, we apply the method to a transactional dataset, showcasing data cleaning, visualization, and performance comparisons for scalable cardinality estimation.
🗒️ Home Task - Design and Analysis of Algorithms (Algorithms for Big Data Processing)
Add a description, image, and links to the hyperloglog topic page so that developers can more easily learn about it.
To associate your repository with the hyperloglog topic, visit your repo's landing page and select "manage topics."