Metadata-Version: 2.1
Name: abloom
Version: 0.3.0
Summary: High-performance Bloom filter for Python
Author-email: Andrew Pribe <andrewpribe@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/ampribe/abloom
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: C
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

<img src="assets/logo.png" alt="abloom logo" style="border-radius: 8px; box-shadow: 0 2px 8px rgba(0,0,0,0.3);"><br>

[![PyPI](https://img.shields.io/pypi/v/abloom)](https://pypi.org/project/abloom/)
[![Python](https://img.shields.io/pypi/pyversions/abloom)](https://pypi.org/project/abloom/)
[![Tests](https://img.shields.io/github/actions/workflow/status/ampribe/abloom/test.yml)](https://github.com/ampribe/abloom/actions/workflows/test.yml)

`abloom` is a high-performance Bloom filter implementation for Python, written in C.

## Why `abloom`?
- **Fast**: 2-3x faster than `rbloom` on add/update, 1.3x faster on lookup
- **Tested**: Python 3.8+ on Linux, MacOS, and Windows

## Quick Start
Install with `pip install abloom`. 

```python
from abloom import BloomFilter

bf = BloomFilter(1_000_000, 0.01)  # capacity, false positive rate
bf.add(1)
bf.update(["a", "b", "c"])

1 in bf                 # True
2 in bf                 # False
bf2 = bf.copy()         # duplicate filter
combined = bf | bf2     # union of filters
bf.clear()              # reset to empty
```

## Benchmarks
| Operation | fastbloom_rs | pybloom_live | pybloomfiltermmap | rbloom | **abloom** | Speedup |
|-----------|--------------|--------------|-------------------|--------|--------|---------|
| Add | 84.1ms | 1.45s | 112.2ms | 49.5ms | **16.1ms** | 3.08x |
| Lookup | 122.1ms | 1.25s | 92.2ms | 39.9ms | **30.1ms** | 1.33x |
| Update | - | - | 110.5ms | 15.3ms | **6.4ms** | 2.40x |

*1M integers, 1% FPR, Apple M2. Full results [here](https://github.com/ampribe/abloom/blob/main/BENCHMARKS.md).*

## Use Cases
### Database Optimization
```python
user_cache = BloomFilter(10_000_000, 0.01)
if user_id not in user_cache:
    return None           # Definitely not in DB
return db.query(user_id)  # Probably in DB
```

### Web Crawling
```python
seen = BloomFilter(10_000_000, 0.001)
if url not in seen:
    seen.add(url)
    crawl(url)
```

### Spam Detection
```python
spam_filter = BloomFilter(1_000_000, 0.001)
spam_filter.update(spam_words)
if word in spam_filter:
    flag_as_potential_spam()
```

## Limitations
`abloom` relies on Python's built-in hash function, so types must implement `__hash__`. Python uses a unique seed for hashes within each process, so transferring Bloom filters between processes is not possible.

`abloom`'s optimizations require ~1.5-2x memory overhead compared to the standard implementation, which can reduce performance for extremely large workloads (high capacity, low FPR), though `abloom` is still faster than alternatives. See [implementation](https://github.com/ampribe/abloom/blob/main/docs/IMPLEMENTATION.md#21-memory-overhead) for more details. 

## Development
### Testing

```bash
pip install -e . --group test
pytest tests/ --ignore=tests/test_benchmark.py -v
```

See [Testing](https://github.com/ampribe/abloom/blob/main/docs/TESTING.md) for more details.

### Benchmarking

```bash
pip install -e . --group benchmark
pytest tests/test_benchmark.py --benchmark-only
```

See [Benchmarking](https://github.com/ampribe/abloom/blob/main/docs/BENCHMARKING.md) for more details.
