Skip to content

Releases: facebook/rocksdb

v10.7.5

22 Oct 21:41

Choose a tag to compare

10.7.5 (10/20/2025)

Bug Fixes

  • Fix a bug in Page unpinning in MultiScan

10.7.4 (10/14/2025)

Public API Changes

  • The MultiScan API contract is updated. After a multi scan range got prepared with Prepare API call, the following seeks must seek the start of each prepared scan range in order. In addition, when limit is set, upper bound must be set to the same value of limit before each seek

10.7.3 (10/06/2025)

Bug Fixes

  • Fix a few bugs in MultiScan

10.7.2 (09/30/2025)

Bug Fixes

  • Fix incorrect MultiScan seek error status due to bugs in handling range limit falling between adjacent SST files key range.

Performance Improvements

  • Fixed a performance regression in LZ4 compression that started in version 10.6.0

10.7.0 (09/24/2025)

New Features

  • Add the fail_if_no_udi_on_open flag in BlockBasedTableOption to control whether a missing user defined index block in a SST is a hard error or not.
  • A new flag memtable_veirfy_per_key_checksum_on_seek is added to AdvancedColumnFamilyOptions. When it is enabled, it will validate key checksum along the binary search path on skiplist based memtable during seek operation.
  • Introduce option MultiScanArgs::use_async_io to enable asynchronous I/O during MultiScan, instead of waiting for I/O to be done in Prepare().
  • Add new option MultiScanArgs::max_prefetch_size that limits the memory usage of per file pinning of prefetched blocks.
  • Improved sst_dump by allowing standalone file and directory arguments without --file=. Also added new options and better output for sst_dump --command=recompress. See sst_dump --help

Public API Changes

  • HyperClockCache with no estimated_entry_charge is now production-ready and is the preferred block cache implementation vs. LRUCache. Please consider updating your code to minimize the risk of hitting performance bottlenecks or anomalies from LRUCache. See cache.h for more detail.
  • RocksDB now requires a C++20 compatible compiler (GCC >= 11, Clang >= 10, Visual Studio >= 2019), including for any code using RocksDB headers.
  • MultiScanArgs used to have a default constructor with default parameter of BytewiseComparator. Now it always requires Comparator in its constructor.

Behavior Changes

  • The default provided block cache implementation is now HyperClockCache instead of LRUCache, when block_cache is nullptr (default) and no_block_cache==false (default). We recommend explicitly creating a HyperClockCache block cache based on memory budget and sharing it across all column families and even DB instances. This change could expose previously hidden memory or resource leaks.
  • Allow UDIs with a non BytewiseComparator

Bug Fixes

  • Reported numbers for compaction and flush CPU usage now include time spent by parallel compression worker threads. This now means compaction/flush CPU usage could exceed the wall clock time.
  • Fix a race condition in FIFO size-based compaction where concurrent threads could select the same non-L0 file, causing assertion failures in debug builds or "Cannot delete table file from LSM tree" errors in release builds.
  • Fix a bug in RocksDB MultiScan with UDI when one of the scan ranges is determined to be empty by the UDI, which causes incorrect results.

Performance Improvements

  • Add a new table property "rocksdb.key.smallest.seqno" which records the smallest sequence number of all keys in file. It makes ingesting DB generated files faster by
    avoiding scanning the whole file to find the smallest sequence number.
  • Add a new experimental PerKeyPointLockManager to improve efficiency under high lock contention. PointLockManager was not efficient when there is high write contention on same key, as it uses a single conditional variable per lock stripe. PerKeyPointLockManager uses per thread conditional variable supporting fifo order. Although this is an experimental feature. By default, it is disabled. A new boolean flag TransactionDBOptions::use_per_key_point_lock_mgr is added to optionally enable it. Search the flag in code for more info.
    Together, a new configuration TransactionOptions::deadlock_timeout_us is added, which allows the transaction to wait for a short period before perform deadlock detection. When the workload has low lock contention, the deadlock_timeout_us can be configured to be slightly higher than average transaction execution time, so that transaction would likely be able to take the lock before deadlock detection is performed when it is waiting for a lock. This allows transaction to reduce CPU cost on performing deadlock detection, which could be expensive in CPU time. When the workload has high lock contention, the deadlock_timeout_us can be configured to 0, so that transaction would perform deadlock detection immediately. By default the value is 0 to keep the behavior same as before.
  • Majorly improved CPU efficiency and scalability of parallel compression (CompressionOptions::parallel_threads > 1), though this efficiency improvement makes parallel compression currently incompatible with UserDefinedIndex and with old setting of decouple_partitioned_filters=false. Parallel compression is now considered a production-ready feature. Maximum performance is available with -DROCKSDB_USE_STD_SEMAPHORES at compile time, but this is not currently recommended because of reported bugs in implementations of std::counting_semaphore/binary_semaphore.

v10.6.2

22 Oct 17:48

Choose a tag to compare

10.6.2 (09/15/2025)

Bug Fixes

  • Fix a race condition in FIFO size-based compaction where concurrent threads could select the same non-L0 file, causing assertion failures in debug builds or "Cannot delete table file from LSM tree" errors in release builds.

10.6.1 (09/05/2025)

New Features

  • Add the fail_if_no_udi_on_open flag in BlockBasedTableOption to control whether a missing user defined index block in a SST is a hard error or not.
  • Add new option MultiScanArgs::max_prefetch_size that limits the memory usage of per file pinning of prefetched blocks.

10.6.0 (08/22/2025)

New Features

  • Introduce column family option cf_allow_ingest_behind. This option aims to replace DBOptions::allow_ingest_behind to enable ingest behind at the per-CF level. DBOptions::allow_ingest_behind is deprecated.
  • Introduce MultiScanArgs::io_coalesce_threshold to allow a configurable IO coalescing threshold.

Public API Changes

  • IngestExternalFileOptions::allow_db_generated_files now allows files ingestion of any DB generated SST file, instead of only the ones with all keys having sequence number 0.
  • decouple_partitioned_filters = true is now the default in BlockBasedTableOptions.
  • GetTtl() API is now available in TTL DB
  • Minimum supported version of LZ4 library is now 1.7.0 (r129 from 2015)
  • Some changes to experimental Compressor and CompressionManager APIs
  • A new Filesystem::SyncFile function is added for syncing a file that was already written, such as on file ingestion. The default implementation matches previous RocksDB behavior: re-open the file for read-write, sync it, and close it. We recommend overriding for FileSystems that do not require syncing for crash recovery or do not handle (well) re-opening for writes.

Behavior Changes

  • When allow_ingest_behind is enabled, compaction will no longer drop tombstones based on the absence of underlying data. Tombstones will be preserved to apply to ingested files.

Bug Fixes

  • Files in dropped column family won't be returned to the caller upon successful, offline MANIFEST iteration in GetFileChecksumsFromCurrentManifest.
  • Fix a bug in MultiScan that causes it to fall back to a normal scan when dictionary compression is enabled.
  • Fix a crash in iterator Prepare() when fill_cache=false
  • Fix a bug in MultiScan where incorrect results can be returned when a Scan's range is across multiple files.
  • Fixed a bug in remote compaction that may mistakenly delete live SST file(s) during the cleanup phase when no keys survive the compaction (all expired)
  • Allow a user defined index to be configured from a string.
  • Make the User Defined Index interface consistently use the user key format, fixing the previous mixed usage of internal and user key.

Performance Improvements

  • Small improvement to CPU efficiency of compression using built-in algorithms, and a dramatic efficiency improvement for LZ4HC, based on reusing data structures between invocations.

v10.5.1

11 Aug 15:38

Choose a tag to compare

10.5.1 (08/04/2025)

Bug Fixes

  • Fixed a bug in remote compaction that may mistakenly delete live SST file(s) during the cleanup phase when no keys survive the compaction (all expired)

10.5.0 (07/21/2025)

Public API Changes

  • DB option skip_checking_sst_file_sizes_on_db_open is deprecated, in favor of validating file size in parallel in a thread pool, when db is opened. When DB is opened, with paranoid check enabled, a file with the wrong size would fail the DB open. With paranoid check disabled, the DB open would succeed, the column family with the corrupted file would not be read or write, while the other healthy column families could be read and write normally. When max_open_files option is not set to -1, only a subset of the files will be opened and checked. The rest of the files will be opened and checked when they are accessed.
  • GetTtl() API is now available in TTL DB

Behavior Changes

  • PessimisticTransaction::GetWaitingTxns now returns waiting transaction information even if the current transaction has timed out. This allows the information to be surfaced to users for debugging purposes once it is known that the timeout has occured.
  • A new API GetFileSize is added to FSRandomAccessFile interface class. It uses fstat vs stat on the posix implementation which is more efficient. Caller could use it to get file size faster. This function might be required in the future for FileSystem implementation outside of the RocksDB code base.
  • RocksDB now triggers eligible compactions every 12 hours when periodic compaction is configured. This solves a limitation of the compaction trigger mechanism, which would only trigger compaction after specific events like flush, compaction, or SetOptions.

Bug Fixes

  • Fix a bug in BackupEngine that can crash backup due to a null FSWritableFile passed to WritableFileWriter.
  • Fix DB::NewMultiScan iterator to respect the scan upper bound specified in ScanOptions

Performance Improvements

  • Optimized MultiScan using BlockBasedTable to coalesce I/Os and prefetch all data blocks.

v10.4.2

11 Jul 00:32
410c562

Choose a tag to compare

10.4.2 (07/09/2025)

Bug Fixes

  • Fix a race condition between concurrent DB::Open sharing the same SstFileManager instance.

10.4.1 (07/01/2025)

Behavior Changes

  • RocksDB now triggers eligible compactions every 12 hours when periodic compaction is configured. This solves a limitation of the compaction trigger mechanism, which would only trigger compaction after specific events like flush, compaction, or SetOptions.

Bug Fixes

  • Fix a bug in BackupEngine that can crash backup due to a null FSWritableFile passed to WritableFileWriter.

10.4.0 (06/20/2025)

New Features

  • Add a new CF option memtable_avg_op_scan_flush_trigger that supports triggering memtable flush when an iterator scans through an expensive range of keys, with the average number of skipped keys from the active memtable exceeding the threshold.
  • Vector based memtable now supports concurrent writers (DBOptions::allow_concurrent_memtable_write) #13675.
  • Add new experimental TransactionOptions::large_txn_commit_optimize_byte_threshold to enable optimizations for large transaction commit by transaction batch data size.
  • Add a new option CompactionOptionsUniversal::reduce_file_locking and if it's true, auto universal compaction picking will adjust to minimize locking of input files when bottom priority compactions are waiting to run. This can increase the likelihood of existing L0s being selected for compaction, thereby improving write stall and reducing read regression.
  • Add new format_version=7 to aid experimental support of custom compression algorithms with CompressionManager and block-based table. This format version includes changing the format of TableProperties::compression_name.

Public API Changes

  • Change NewExternalTableFactory to return a unique_ptr instead of shared_ptr.
  • Add an optional min file size requirement for deletion triggered compaction. It can be specified when creating CompactOnDeletionCollectorFactory.

Behavior Changes

  • TransactionOptions::large_txn_commit_optimize_threshold now has default value 0 for disabled. TransactionDBOptions::txn_commit_bypass_memtable_threshold now has no effect on transactions.

Bug Fixes

  • Fix a bug where CreateColumnFamilyWithImport() could miss the SST file for the memtable flush it triggered. The exported CF then may not contain the updates in the memtable when CreateColumnFamilyWithImport() is called.
  • Fix iterator operations returning NotImplemented status if disallow_memtable_writes and paranoid_memory_checks CF options are both set.
  • Fixed handling of file checksums in IngestExternalFile() to allow providing checksums using recognized but not necessarily the DB's preferred checksum function, to ease migration between checksum functions.

v10.2.1

29 Apr 21:06

Choose a tag to compare

10.2.1 (2025-04-24)

Bug Fixes

  • Fix improper initialization of ExternalTableOptions

10.2.0 (2025-04-21)

New Features

  • Provide histogram stats COMPACTION_PREFETCH_BYTES to measure number of bytes for RocksDB's prefetching (as opposed to file
    system's prefetch) on SST file during compaction read
  • A new API DB::GetNewestUserDefinedTimestamp is added to return the newest user defined timestamp seen in a column family
  • Introduce API IngestWriteBatchWithIndex() for ingesting updates into DB while bypassing memtable writes. This improves performance when writing a large write batch to the DB.
  • Add a new CF option memtable_op_scan_flush_trigger that triggers a flush of the memtable if an iterator's Seek()/Next() scans over a certain number of invisible entries from the memtable.

Public API Changes

  • AdvancedColumnFamilyOptions.max_write_buffer_number_to_maintain is deleted. It's deprecated since introduction of a better option max_write_buffer_size_to_maintain since RocksDB 6.5.0.
  • Deprecated API DB::MaxMemCompactionLevel().
  • Deprecated ReadOptions::ignore_range_deletions.
  • Deprecated API experimental::PromoteL0().
  • Added arbitrary string map for additional options to be overriden for remote compactions
  • The fail_if_options_file_error option in DBOptions has been removed. The behavior now is to always return failure in any API that fails to persist the OPTIONS file.

Behavior Changes

  • Make stats PREFETCH_BYTES_USEFUL, PREFETCH_HITS, PREFETCH_BYTES only account for prefetching during user initiated scan

Bug Fixes

  • Fix a bug in Posix file system that the FSWritableFile created via FileSystem::ReopenWritableFile internally does not track the correct file size.
  • Fix a bug where tail size of remote compaction output is not persisted in primary db's manifest

v10.1.3

14 Apr 20:50

Choose a tag to compare

10.1.3 (2025-04-09)

Bug Fixes

  • Fix a bug where resurrected full_history_ts_low from a previous session that enables UDT is used by this session that disables UDT.

10.1.2 (2025-04-07)

Bug Fixes

  • Fix a bug where tail size of remote compaction output is not persisted in primary db's manifest

10.1.0 (2025-03-24)

New Features

  • Added a new DBOptions.calculate_sst_write_lifetime_hint_set setting that allows to customize which compaction styles SST write lifetime hint calculation is allowed on. Today RocksDB supports only two modes kCompactionStyleLevel and kCompactionStyleUniversal.
  • Add a new field num_l0_files in CompactionJobInfo about the number of L0 files in the CF right before and after the compaction
  • Added per-key-placement feature in Remote Compaction
  • Implemented API DB::GetPropertiesOfTablesByLevel that retrieves table properties for files in each LSM tree level

Public API Changes

  • GetAllKeyVersions() now interprets empty slices literally, as valid keys, and uses new OptSlice type default value for extreme upper and lower range limits.
  • DeleteFilesInRanges() now takes RangeOpt which is based on OptSlice. The overload taking RangePtr is deprecated.
  • Add an unordered map of name/value pairs, ReadOptions::property_bag, to pass opaque options through to an external table when creating an Iterator.
  • Introduced CompactionServiceJobStatus::kAborted to allow handling aborted scenario in Schedule(), Wait() or OnInstallation() APIs in Remote Compactions.
  • format_version < 2 in BlockBasedTableOptions is no longer supported for writing new files. Support for reading such files is deprecated and might be removed in the future. CompressedSecondaryCacheOptions::compress_format_version == 1 is also deprecated.

Behavior Changes

  • ldb now returns an error if the specified --compression_type is not supported in the build.
  • MultiGet with snapshot and ReadOptions::read_tier = kPersistedTier will now read a consistent view across CFs (instead of potentially reading some CF before and some CF after a flush).
  • CreateColumnFamily() is no longer allowed on a read-only DB (OpenForReadOnly())

Bug Fixes

  • Fixed stats for Tiered Storage with preclude_last_level feature

RocksDB 9.11.2 Release

29 Mar 21:45

Choose a tag to compare

Rocksdb Change Log

NOTE: Entries for next release do not go here. Follow instructions in unreleased_history/README.txt

9.11.2 (2025-03-29)

Bump patch version to fix a mistake in the previous 9.11 release tag

9.11.1 (2025-02-19)

New Features

  • Added the ability to plug-in a custom table reader implementation. See include/rocksdb/external_table_reader.h for more details.

9.11.0 (2025-01-17)

New Features

  • Introduce CancelAwaitingJobs() in CompactionService interface which will allow users to implement cancellation of running remote compactions from the primary instance
  • Experimental feature: RocksDB now supports defining secondary indices, which are automatically maintained by the storage engine. Secondary indices provide a new customization point: applications can provide their own by implementing the new SecondaryIndex interface. See the SecondaryIndex API comments for more details. Note: this feature is currently only available in conjunction with write-committed pessimistic transactions, and Merge is not yet supported.
  • Provide a new option track_and_verify_wals to track and verify various information about WAL during WAL recovery. This is intended to be a better replacement to track_and_verify_wals_in_manifest.

Public API Changes

  • Add io_buffer_size to BackupEngineOptions to enable optimal configuration of IO size
  • Clean up all the references to random_access_max_buffer_size, related rules and all the clients wrappers. This option has been officially deprecated in 5.4.0.
  • Add file_ingestion_nanos and file_ingestion_blocking_live_writes_nanos in PerfContext to observe file ingestions
  • Offer new DB::Open and variants that use std::unique_ptr<DB>* output parameters and deprecate the old versions that use DB** output parameters.
  • The DB::DeleteFile API is officially deprecated.

Behavior Changes

  • For leveled compaction, manual compaction (CompactRange()) will be more strict about keeping compaction size under max_compaction_bytes. This prevents overly large compactions in some cases (#13306).
  • Experimental tiering options preclude_last_level_data_seconds and preserve_internal_time_seconds are now mutable with SetOptions(). Some changes to handling of these features along with long-lived snapshots and range deletes made this possible.

Bug Fixes

  • Fix a longstanding major bug with SetOptions() in which setting changes can be quietly reverted.

RocksDB 10.0.1 Release

31 Mar 18:29

Choose a tag to compare

10.0.1 (2025-03-05)

Public API Changes

  • Add an unordered map of name/value pairs, ReadOptions::property_bag, to pass opaque options through to an external table when creating an Iterator.
  • Introduced CompactionServiceJobStatus::kAborted to allow handling aborted scenario in Schedule(), Wait() or OnInstallation() APIs in Remote Compactions.
  • Added a column family option disallow_memtable_writes to safely fail any attempts to write to a non-default column family. This can be used for column families that are ingest only.

10.0.0 (2025-02-21)

New Features

  • Introduced new auto_refresh_iterator_with_snapshot opt-in knob that (when enabled) will periodically release obsolete memory and storage resources for as long as the iterator is making progress and its supplied read_options.snapshot was initialized with non-nullptr value.
  • Added the ability to plug-in a custom table reader implementation. See include/rocksdb/external_table_reader.h for more details.
  • Experimental feature: RocksDB now supports FAISS inverted file based indices via the secondary indexing framework. Applications can use FAISS secondary indices to automatically quantize embeddings and perform K-nearest-neighbors similarity searches. See FaissIVFIndex and SecondaryIndex for more details. Note: the FAISS integration currently requires using the BUCK build.
  • Add new DB property num_running_compaction_sorted_runs that tracks the number of sorted runs being processed by currently running compactions
  • Experimental feature: added support for simple secondary indices that index the specified column as-is. See SimpleSecondaryIndex and SecondaryIndex for more details.
  • Added new TransactionDBOptions::txn_commit_bypass_memtable_threshold, which enables optimized transaction commit (see TransactionOptions::commit_bypass_memtable) when the transaction size exceeds a configured threshold.

Public API Changes

  • Updated the query API of the experimental secondary indexing feature by removing the earlier SecondaryIndex::NewIterator virtual and adding a SecondaryIndexIterator class that can be utilized by applications to find the primary keys for a given search target.
  • Added back the ability to leverage the primary key when building secondary index entries. This involved changes to the signatures of SecondaryIndex::GetSecondary{KeyPrefix,Value} as well as the addition of a new method SecondaryIndex::FinalizeSecondaryKeyPrefix. See the API comments for more details.
  • Minimum supported version of ZSTD is now 1.4.0, for code simplification. Obsolete CompressionType kZSTDNotFinalCompression is also removed.

Behavior Changes

  • VerifyBackup in verify_with_checksum=true mode will now evaluate checksums in parallel. As a result, unlike in case of original implementation, the API won't bail out on a very first corruption / mismatch and instead will iterate over all the backup files logging success / degree_of_failure for each.
  • Reversed the order of updates to the same key in WriteBatchWithIndex. This means if there are multiple updates to the same key, the most recent update is ordered first. This affects the output of WBWIIterator. When WriteBatchWithIndex is created with overwrite_key=true, this affects the output only if Merge is used (#13387).
  • Added support for Merge operations in transactions using option TransactionOptions::commit_bypass_memtable.

Bug Fixes

  • Fixed GetMergeOperands() API in ReadOnlyDB and SecondaryDB
  • Fix a bug in GetMergeOperands() that can return incorrect status (MergeInProgress) and incorrect number of merge operands. This can happen when GetMergeOperandsOptions::continue_cb is set, both active and immutable memtables have merge operands and the callback stops the look up at the immutable memtable.

RocksDB 9.10.0 Release

02 Jan 18:31

Choose a tag to compare

9.10.0 (2024-12-12)

New Features

  • Introduce TransactionOptions::commit_bypass_memtable to enable transaction commit to bypass memtable insertions. This can be beneficial for transactions with many operations, as it reduces commit time that is mostly spent on memtable insertion.

Public API Changes

  • Deprecated Remote Compaction APIs (StartV2, WaitForCompleteV2) are completely removed from the codebase

Behavior Changes

  • DB::KeyMayExist() now follows its function comment, which means value parameter can be null, and it will be set only if value_found is passed in.

Bug Fixes

  • Fix the issue where compaction incorrectly drops a key when there is a snapshot with a sequence number of zero.
  • Honor ConfigOptions.ignore_unknown_options in ParseStruct()

Performance Improvements

  • Enable reuse of file system allocated buffer for synchronous prefetching.
  • In buffered IO mode, try to align writes on power of 2 if checksum handoff is not enabled for the file type being written.

RocksDB release 9.9.3

17 Dec 18:06

Choose a tag to compare

9.9.3 (2024-12-03)

Performance Improvements

  • In buffered IO mode, try to align writes on power of 2 if checksum handoff is not enabled for the file type being written.

9.9.2 (2024-11-22)

Bug Fixes

  • Honor ConfigOptions.ignore_unknown_options in ParseStruct()

9.9.1 (2024-11-30)

Behavior Changes

  • Updates the hidden hook RocksDbThreadYieldAndCheckAbort() to support MySQL to abort long-running query.

9.9.0 (2024-11-18)

New Features

  • Multi-Column-Family-Iterator (CoalescingIterator/AttributeGroupIterator) is no longer marked as experimental
  • Adds a new table property "rocksdb.newest.key.time" which records the unix timestamp of the newest key. Uses this table property for FIFO TTL and temperature change compaction.

Public API Changes

  • Added a new API Transaction::GetAttributeGroupIterator that can be used to create a multi-column-family attribute group iterator over the specified column families, including the data from both the transaction and the underlying database. This API is currently supported for optimistic and write-committed pessimistic transactions.
  • Added a new API Transaction::GetCoalescingIterator that can be used to create a multi-column-family coalescing iterator over the specified column families, including the data from both the transaction and the underlying database. This API is currently supported for optimistic and write-committed pessimistic transactions.

Behavior Changes

  • BaseDeltaIterator now honors the read option allow_unprepared_value.

Bug Fixes

  • BaseDeltaIterator now calls PrepareValue on the base iterator in case it has been created with the allow_unprepared_value read option set. Earlier, such base iterators could lead to incorrect values being exposed from BaseDeltaIterator.
  • Fix a leak of obsolete blob files left open until DB::Close(). This bug was introduced in version 9.4.0.
  • Fix missing cases of corruption retry during DB open and read API processing.
  • Fix a bug for transaction db with 2pc where an old WAL may be retained longer than needed (#13127).
  • Fix leaks of some open SST files (until DB::Close()) that are written but never become live due to various failures. (We now have a check for such leaks with no outstanding issues.)
  • Fix a bug for replaying WALs for WriteCommitted transaction DB when its user-defined timestamps setting is toggled on/off between DB sessions.

Performance Improvements

  • Fix regression in issue #12038 due to Options::compaction_readahead_size greater than max_sectors_kb (i.e, largest I/O size that the OS issues to a block device defined in linux)