Hamming distance utilities and LSH binary projection for TypeScript. Zero dependencies. Works in Node.js, Bun, Deno, and Edge runtimes.
This is similar to pgvector but with some key differences:
Similarities:
- Both enable similarity search for AI/ML applications
- Both work with vector embeddings (like OpenAI's 1536-dimensional vectors)
- Both support approximate nearest neighbor search for performance
| Feature | @ekaone/hamming | pgvector |
|---|---|---|
| Storage | In-memory binary codes (8 bytes) | PostgreSQL database |
| Distance Metric | Hamming distance (binary) | Cosine/Euclidean/L2 (float) |
| Performance | XOR + popcount (extremely fast) | Float vector math |
| Memory | 750x smaller (64-bit vs 1536-float) | Full float vectors |
| Persistence | Application-level | Database-level |
| Scaling | Limited by RAM | Scales with database |
- Semantic caching for AI applications
- Near-duplicate detection
- Embedding similarity search (OpenAI, etc.)
- Fast approximate nearest neighbor search
npm install @ekaone/hammingpnpm install @ekaone/hammingyarn install @ekaone/hammingHamming distance between two equal-length strings. Counts positions where characters differ.
import { hammingString } from "@ekaone/hamming";
hammingString("karolin", "kathrin"); // → 3
hammingString("1011101", "1001001"); // → 2
hammingString("abc", "abc"); // → 0Throws RangeError if strings have different lengths.
Normalized distance between two equal-length strings. Returns 0.0 (identical) to 1.0 (completely different).
import { hammingStringNorm } from "@ekaone/hamming";
hammingStringNorm("karolin", "kathrin"); // → 0.428...
hammingStringNorm("abc", "abc"); // → 0
hammingStringNorm("abc", "xyz"); // → 1Hamming distance between two 32-bit integers via XOR + popcount.
import { hammingBits } from "@ekaone/hamming";
hammingBits(0b1011101, 0b1001001); // → 2
hammingBits(0x00000000, 0xffffffff); // → 32Hamming distance between two BigInt values. Useful for 64-bit or wider binary codes.
import { hammingBigInt } from "@ekaone/hamming";
hammingBigInt(0b1011101n, 0b1001001n); // → 2
hammingBigInt(0xffffffffffffffffn, 0x0n); // → 64Hamming distance between two Uint8Array buffers of equal length. Counts differing bits across all bytes.
import { hammingBuffer } from "@ekaone/hamming";
const a = new Uint8Array([0b11111111, 0b00000000]);
const b = new Uint8Array([0b00000000, 0b11111111]);
hammingBuffer(a, b); // → 16Throws RangeError if buffers have different lengths.
Normalized Hamming distance between two Uint8Array buffers. Returns 0.0 to 1.0.
import { hammingBufferNorm } from "@ekaone/hamming";
const a = new Uint8Array([0xff]);
const b = new Uint8Array([0x00]);
hammingBufferNorm(a, b); // → 1.0Generate a random projection matrix for LSH (locality-sensitive hashing). Call this once and reuse the result — the same planes must be used for all toBinaryCode calls within a single context.
import { generatePlanes } from "@ekaone/hamming";
// 64 output bits, 1536-dimensional input (OpenAI embeddings)
const planes = generatePlanes(64, 1536);| Parameter | Description |
|---|---|
dims |
Number of output bits (e.g. 64) |
inputDims |
Dimensionality of the input float vector (e.g. 1536) |
Project a float embedding vector onto random hyperplanes, producing a compact binary code (Uint8Array). Each bit is the sign of the dot product with one plane.
import { generatePlanes, toBinaryCode } from "@ekaone/hamming";
const planes = generatePlanes(64, 1536);
const embedding = await getEmbedding("How do I reset my password?");
const code = toBinaryCode(embedding, planes);
// → Uint8Array(8) — 64 bits packed into 8 bytesThe binary code approximates cosine similarity: vectors that are close in embedding space will have similar binary codes and a low Hamming distance between them.
Hamming distance between two binary codes produced by toBinaryCode. Lower = more similar.
import { generatePlanes, toBinaryCode, binaryDistance } from "@ekaone/hamming";
const planes = generatePlanes(64, 1536);
const codeA = toBinaryCode(embeddingA, planes);
const codeB = toBinaryCode(embeddingB, planes);
binaryDistance(codeA, codeB); // → 0 (identical) to 64 (opposite)import { hammingStringNorm } from "@ekaone/hamming";
const commands = ["init", "build", "publish", "release"];
const input = "publich"; // user typo
const suggestions = commands
.filter(cmd => cmd.length === input.length)
.map(cmd => ({ cmd, score: hammingStringNorm(cmd, input) }))
.filter(({ score }) => score >= 0.7)
.sort((a, b) => b.score - a.score);
console.log(`Did you mean: ${suggestions[0].cmd}?`);
// → Did you mean: publish?import { hammingStringNorm } from "@ekaone/hamming";
const flags = ["enable_cache", "enable_cach", "disable_cache", "feature_x"];
const deduped = flags.reduce<string[]>((acc, flag) => {
const isDup = acc.some(existing =>
existing.length === flag.length &&
hammingStringNorm(existing, flag) > 0.85
);
return isDup ? acc : [...acc, flag];
}, []);
// → ["enable_cache", "disable_cache", "feature_x"]import { generatePlanes, toBinaryCode, binaryDistance } from "@ekaone/hamming";
const planes = generatePlanes(64, 1536);
const cache = new Map<string, string>();
async function cachedAnswer(embedding: number[], prompt: string) {
const code = toBinaryCode(embedding, planes);
const key = code.join(",");
for (const [cachedKey, response] of cache) {
const cachedCode = new Uint8Array(cachedKey.split(",").map(Number));
if (binaryDistance(code, cachedCode) <= 4) {
return response; // cache hit
}
}
const response = await callLLM(prompt);
cache.set(key, response);
return response;
}For a production-ready semantic cache with LRU/LFU eviction and TTL, see @ekaone/semantic-cache.
toBinaryCode uses random hyperplane projection (a form of locality-sensitive hashing). For each random plane, it checks which side of the plane the input vector falls on — producing a 1 or 0. With 64 planes you get a 64-bit fingerprint.
The key property: vectors that are close in the original high-dimensional space (high cosine similarity) will tend to land on the same side of most planes, resulting in similar binary codes with a low Hamming distance. This lets you approximate expensive float vector comparisons with a fast XOR + popcount.
"forgot my password" → [0.21, -0.84, ...] → 0b10110010...
"reset my password" → [0.22, -0.83, ...] → 0b10110110...
Hamming distance = 2 ✓
"cancel subscription" → [-0.60, 0.41, ...] → 0b01001101...
Hamming distance = 29 ✗
MIT © Eka Prasetia
⭐ If this library helps you, please consider giving it a star on GitHub!