A fast, embedded key-value store for the JVM (pure Java), tuned for IO-bound workloads with submillisecond reads. Keys live in an off-heap in-memory index; values in append-only log files — so a read is at most one disk seek. A maintained fork of Yahoo's unmaintained HaloDB, modernized for JDK 22+ (off-heap via the Foreign Function & Memory API) and extended with arbitrary-length keys and optional prefix/range scans.
libraryDependencies += "com.outr" % "halodb-revive" % "0.7.0"<dependency>
<groupId>com.outr</groupId>
<artifactId>halodb-revive</artifactId>
<version>0.7.0</version>
</dependency>Requires JDK 22+; run with --enable-native-access=ALL-UNNAMED to silence the off-heap layer's
restricted-method warning.
Keys and values are byte[]:
HaloDBOptions options = new HaloDBOptions();
options.setMaxFileSize(1024 * 1024 * 1024); // 1 GB data files
options.setCompactionThresholdPerFile(0.5); // compact at 50% stale (write amplification ≈ 2)
HaloDB db = HaloDB.open(new File("/tmp/halodb"), options); // created/reopened; index rebuilt from disk
db.put("hello".getBytes(), "world".getBytes());
byte[] value = db.get("hello".getBytes()); // "world"
db.delete("hello".getBytes());
HaloDBIterator iterator = db.newIterator(); // iterate all records (unordered)
while (iterator.hasNext()) {
Record record = iterator.next(); // record.getKey() / record.getValue()
}
db.close();HaloDBOptions exposes every tuning knob:
HaloDBOptions options = new HaloDBOptions();
// Size of each data file (1 GB here).
options.setMaxFileSize(1024 * 1024 * 1024);
// Size of each tombstone file (64 MB here). Larger files mean fewer files but slower db open; too
// small results in a large number of tombstone files in the db folder.
options.setMaxTombstoneFileSize(64 * 1024 * 1024);
// Number of threads used to scan index and tombstone files in parallel to build the in-memory index
// on open. Must be positive and <= Runtime.getRuntime().availableProcessors(). Speeds up db open.
options.setBuildIndexThreads(8);
// Threshold at which the page cache is synced to disk. Data is durable only once flushed, so more
// data is lost on power loss if this is set too high; too low may hurt read/write performance.
options.setFlushDataSizeBytes(10 * 1024 * 1024);
// Percentage of stale data in a data file at which it will be compacted. This (with compactionJobRate)
// is the most important tuning knob: it controls write vs space amplification. If set to x, write
// amplification is approximately 1/x. Increasing it reduces write amplification but increases space
// amplification.
options.setCompactionThresholdPerFile(0.7);
// How fast the compaction job runs — the amount of data the compaction thread copies per second.
// The optimal value depends on compactionThresholdPerFile.
options.setCompactionJobRate(50 * 1024 * 1024);
// Preallocates enough memory for the off-heap index; if too low the db may need to rehash. For a db
// of size n, set this to 2*n.
options.setNumberOfRecords(100_000_000);
// A delete writes a tombstone record; the tombstone can be removed only once all previous versions of
// that key have been removed by compaction. Enabling this deletes, during startup, all tombstone
// records whose previous versions were already removed from the data file.
options.setCleanUpTombstonesDuringOpen(true);
// HaloDB allocates native memory for the in-memory index. Enabling this releases all allocated memory
// back to the kernel when the db is closed. Not needed if the JVM is shut down on close (the kernel
// reclaims it automatically). Without the memory pool, this can be slow as _free_ is called per record.
options.setCleanUpInMemoryIndexOnClose(false);
// ** memory pool settings ** — a lower-footprint, lower-fragmentation index using fixed-size slots.
options.setUseMemoryPool(true);
// The hash table (like Java 7's ConcurrentHashMap) is split into segments — twice the number of CPU
// cores — each managing its own native memory, further divided into chunks of this size.
options.setMemoryPoolChunkSize(2 * 1024 * 1024);
// With a memory pool, fixedKeySize declares the inline key size of each slot. Keys up to this size
// occupy a single slot; longer keys overflow into additional chained slots, so keys of any length are
// supported (set this to your typical key size for best density).
options.setFixedKeySize(8);
// Enables prefix/range scans (see below). Requires fixed-length keys.
options.setUseOrderedIndex(true);Enabling the ordered index keeps an off-heap adaptive radix tree alongside the hash index, adding ascending prefix scans. It needs fixed-length keys and leaves point-read latency unchanged.
HaloDBOptions options = new HaloDBOptions();
options.setUseOrderedIndex(true);
options.setFixedKeySize(8);
HaloDB db = HaloDB.open(new File("/tmp/halodb-ordered"), options);
Iterator<Record> matches = db.prefixScan("user:001".getBytes()); // ascending key order
db.close();- Read amplification of 1 — submillisecond point reads; the off-heap index keeps the JVM heap small.
- Keys may be any length; the ordered index is the exception (fixed-length only).
- The WAL is the database: on restart only the actively-written files are repaired. Writes hit the
page cache and are flushed at
flushDataSizeBytes, so unflushed data is lost on power loss. - On-disk format is version 1 (4-byte key length); pre-0.7 databases must be rebuilt.
- Benchmarks vs RocksDB (reads, writes, prefix scans, key-size scaling).
- Why HaloDB — design and motivation.
- Originally by Arjun Mannaly at Yahoo; fork maintained by OUTR. Apache License 2.0.