3 releases
| 0.1.2 | Jan 26, 2026 |
|---|---|
| 0.1.1 | Jan 26, 2026 |
| 0.1.0 | Jan 26, 2026 |
#788 in Text processing
1,550 downloads per month
37KB
828 lines
sketchir: sketching primitives for IR.
This crate is intended for index-only similarity sketches used in:
- near-duplicate detection (MinHash / shingles)
- text fingerprinting (SimHash)
- approximate similarity search (LSH-style candidate generation)
Scope here is primitives: signatures, basic indexing, deterministic behavior. Higher-level workflows (crawl dedupe pipelines, content extraction, etc.) belong elsewhere.
Dependencies
~0.5–1MB
~19K SLoC