Releases: facebook/rocksdb
Releases · facebook/rocksdb
v10.7.5
10.7.5 (10/20/2025)
Bug Fixes
- Fix a bug in Page unpinning in MultiScan
10.7.4 (10/14/2025)
Public API Changes
- The MultiScan API contract is updated. After a multi scan range got prepared with Prepare API call, the following seeks must seek the start of each prepared scan range in order. In addition, when limit is set, upper bound must be set to the same value of limit before each seek
10.7.3 (10/06/2025)
Bug Fixes
- Fix a few bugs in MultiScan
10.7.2 (09/30/2025)
Bug Fixes
- Fix incorrect MultiScan seek error status due to bugs in handling range limit falling between adjacent SST files key range.
Performance Improvements
- Fixed a performance regression in LZ4 compression that started in version 10.6.0
10.7.0 (09/24/2025)
New Features
- Add the fail_if_no_udi_on_open flag in BlockBasedTableOption to control whether a missing user defined index block in a SST is a hard error or not.
- A new flag memtable_veirfy_per_key_checksum_on_seek is added to AdvancedColumnFamilyOptions. When it is enabled, it will validate key checksum along the binary search path on skiplist based memtable during seek operation.
- Introduce option MultiScanArgs::use_async_io to enable asynchronous I/O during MultiScan, instead of waiting for I/O to be done in Prepare().
- Add new option
MultiScanArgs::max_prefetch_sizethat limits the memory usage of per file pinning of prefetched blocks. - Improved
sst_dumpby allowing standalone file and directory arguments without--file=. Also added new options and better output forsst_dump --command=recompress. Seesst_dump --help
Public API Changes
- HyperClockCache with no
estimated_entry_chargeis now production-ready and is the preferred block cache implementation vs. LRUCache. Please consider updating your code to minimize the risk of hitting performance bottlenecks or anomalies from LRUCache. See cache.h for more detail. - RocksDB now requires a C++20 compatible compiler (GCC >= 11, Clang >= 10, Visual Studio >= 2019), including for any code using RocksDB headers.
- MultiScanArgs used to have a default constructor with default parameter of BytewiseComparator. Now it always requires Comparator in its constructor.
Behavior Changes
- The default provided block cache implementation is now HyperClockCache instead of LRUCache, when
block_cacheis nullptr (default) andno_block_cache==false(default). We recommend explicitly creating a HyperClockCache block cache based on memory budget and sharing it across all column families and even DB instances. This change could expose previously hidden memory or resource leaks. - Allow UDIs with a non BytewiseComparator
Bug Fixes
- Reported numbers for compaction and flush CPU usage now include time spent by parallel compression worker threads. This now means compaction/flush CPU usage could exceed the wall clock time.
- Fix a race condition in FIFO size-based compaction where concurrent threads could select the same non-L0 file, causing assertion failures in debug builds or "Cannot delete table file from LSM tree" errors in release builds.
- Fix a bug in RocksDB MultiScan with UDI when one of the scan ranges is determined to be empty by the UDI, which causes incorrect results.
Performance Improvements
- Add a new table property "rocksdb.key.smallest.seqno" which records the smallest sequence number of all keys in file. It makes ingesting DB generated files faster by
avoiding scanning the whole file to find the smallest sequence number. - Add a new experimental PerKeyPointLockManager to improve efficiency under high lock contention. PointLockManager was not efficient when there is high write contention on same key, as it uses a single conditional variable per lock stripe. PerKeyPointLockManager uses per thread conditional variable supporting fifo order. Although this is an experimental feature. By default, it is disabled. A new boolean flag TransactionDBOptions::use_per_key_point_lock_mgr is added to optionally enable it. Search the flag in code for more info.
Together, a new configuration TransactionOptions::deadlock_timeout_us is added, which allows the transaction to wait for a short period before perform deadlock detection. When the workload has low lock contention, the deadlock_timeout_us can be configured to be slightly higher than average transaction execution time, so that transaction would likely be able to take the lock before deadlock detection is performed when it is waiting for a lock. This allows transaction to reduce CPU cost on performing deadlock detection, which could be expensive in CPU time. When the workload has high lock contention, the deadlock_timeout_us can be configured to 0, so that transaction would perform deadlock detection immediately. By default the value is 0 to keep the behavior same as before. - Majorly improved CPU efficiency and scalability of parallel compression (
CompressionOptions::parallel_threads> 1), though this efficiency improvement makes parallel compression currently incompatible with UserDefinedIndex and with old setting ofdecouple_partitioned_filters=false. Parallel compression is now considered a production-ready feature. Maximum performance is available with-DROCKSDB_USE_STD_SEMAPHORESat compile time, but this is not currently recommended because of reported bugs in implementations ofstd::counting_semaphore/binary_semaphore.
v10.6.2
10.6.2 (09/15/2025)
Bug Fixes
- Fix a race condition in FIFO size-based compaction where concurrent threads could select the same non-L0 file, causing assertion failures in debug builds or "Cannot delete table file from LSM tree" errors in release builds.
10.6.1 (09/05/2025)
New Features
- Add the fail_if_no_udi_on_open flag in BlockBasedTableOption to control whether a missing user defined index block in a SST is a hard error or not.
- Add new option
MultiScanArgs::max_prefetch_sizethat limits the memory usage of per file pinning of prefetched blocks.
10.6.0 (08/22/2025)
New Features
- Introduce column family option
cf_allow_ingest_behind. This option aims to replaceDBOptions::allow_ingest_behindto enable ingest behind at the per-CF level.DBOptions::allow_ingest_behindis deprecated. - Introduce
MultiScanArgs::io_coalesce_thresholdto allow a configurable IO coalescing threshold.
Public API Changes
IngestExternalFileOptions::allow_db_generated_filesnow allows files ingestion of any DB generated SST file, instead of only the ones with all keys having sequence number 0.decouple_partitioned_filters = trueis now the default in BlockBasedTableOptions.- GetTtl() API is now available in TTL DB
- Minimum supported version of LZ4 library is now 1.7.0 (r129 from 2015)
- Some changes to experimental Compressor and CompressionManager APIs
- A new Filesystem::SyncFile function is added for syncing a file that was already written, such as on file ingestion. The default implementation matches previous RocksDB behavior: re-open the file for read-write, sync it, and close it. We recommend overriding for FileSystems that do not require syncing for crash recovery or do not handle (well) re-opening for writes.
Behavior Changes
- When
allow_ingest_behindis enabled, compaction will no longer drop tombstones based on the absence of underlying data. Tombstones will be preserved to apply to ingested files.
Bug Fixes
- Files in dropped column family won't be returned to the caller upon successful, offline MANIFEST iteration in
GetFileChecksumsFromCurrentManifest. - Fix a bug in MultiScan that causes it to fall back to a normal scan when dictionary compression is enabled.
- Fix a crash in iterator Prepare() when fill_cache=false
- Fix a bug in MultiScan where incorrect results can be returned when a Scan's range is across multiple files.
- Fixed a bug in remote compaction that may mistakenly delete live SST file(s) during the cleanup phase when no keys survive the compaction (all expired)
- Allow a user defined index to be configured from a string.
- Make the User Defined Index interface consistently use the user key format, fixing the previous mixed usage of internal and user key.
Performance Improvements
- Small improvement to CPU efficiency of compression using built-in algorithms, and a dramatic efficiency improvement for LZ4HC, based on reusing data structures between invocations.
v10.5.1
10.5.1 (08/04/2025)
Bug Fixes
- Fixed a bug in remote compaction that may mistakenly delete live SST file(s) during the cleanup phase when no keys survive the compaction (all expired)
10.5.0 (07/21/2025)
Public API Changes
- DB option skip_checking_sst_file_sizes_on_db_open is deprecated, in favor of validating file size in parallel in a thread pool, when db is opened. When DB is opened, with paranoid check enabled, a file with the wrong size would fail the DB open. With paranoid check disabled, the DB open would succeed, the column family with the corrupted file would not be read or write, while the other healthy column families could be read and write normally. When max_open_files option is not set to -1, only a subset of the files will be opened and checked. The rest of the files will be opened and checked when they are accessed.
- GetTtl() API is now available in TTL DB
Behavior Changes
- PessimisticTransaction::GetWaitingTxns now returns waiting transaction information even if the current transaction has timed out. This allows the information to be surfaced to users for debugging purposes once it is known that the timeout has occured.
- A new API GetFileSize is added to FSRandomAccessFile interface class. It uses fstat vs stat on the posix implementation which is more efficient. Caller could use it to get file size faster. This function might be required in the future for FileSystem implementation outside of the RocksDB code base.
- RocksDB now triggers eligible compactions every 12 hours when periodic compaction is configured. This solves a limitation of the compaction trigger mechanism, which would only trigger compaction after specific events like flush, compaction, or SetOptions.
Bug Fixes
- Fix a bug in BackupEngine that can crash backup due to a null FSWritableFile passed to WritableFileWriter.
- Fix DB::NewMultiScan iterator to respect the scan upper bound specified in ScanOptions
Performance Improvements
- Optimized MultiScan using BlockBasedTable to coalesce I/Os and prefetch all data blocks.
v10.4.2
10.4.2 (07/09/2025)
Bug Fixes
- Fix a race condition between concurrent DB::Open sharing the same SstFileManager instance.
10.4.1 (07/01/2025)
Behavior Changes
- RocksDB now triggers eligible compactions every 12 hours when periodic compaction is configured. This solves a limitation of the compaction trigger mechanism, which would only trigger compaction after specific events like flush, compaction, or SetOptions.
Bug Fixes
- Fix a bug in BackupEngine that can crash backup due to a null FSWritableFile passed to WritableFileWriter.
10.4.0 (06/20/2025)
New Features
- Add a new CF option
memtable_avg_op_scan_flush_triggerthat supports triggering memtable flush when an iterator scans through an expensive range of keys, with the average number of skipped keys from the active memtable exceeding the threshold. - Vector based memtable now supports concurrent writers (DBOptions::allow_concurrent_memtable_write) #13675.
- Add new experimental
TransactionOptions::large_txn_commit_optimize_byte_thresholdto enable optimizations for large transaction commit by transaction batch data size. - Add a new option
CompactionOptionsUniversal::reduce_file_lockingand if it's true, auto universal compaction picking will adjust to minimize locking of input files when bottom priority compactions are waiting to run. This can increase the likelihood of existing L0s being selected for compaction, thereby improving write stall and reducing read regression. - Add new
format_version=7to aid experimental support of custom compression algorithms with CompressionManager and block-based table. This format version includes changing the format ofTableProperties::compression_name.
Public API Changes
- Change NewExternalTableFactory to return a unique_ptr instead of shared_ptr.
- Add an optional min file size requirement for deletion triggered compaction. It can be specified when creating
CompactOnDeletionCollectorFactory.
Behavior Changes
TransactionOptions::large_txn_commit_optimize_thresholdnow has default value 0 for disabled.TransactionDBOptions::txn_commit_bypass_memtable_thresholdnow has no effect on transactions.
Bug Fixes
- Fix a bug where CreateColumnFamilyWithImport() could miss the SST file for the memtable flush it triggered. The exported CF then may not contain the updates in the memtable when CreateColumnFamilyWithImport() is called.
- Fix iterator operations returning NotImplemented status if disallow_memtable_writes and paranoid_memory_checks CF options are both set.
- Fixed handling of file checksums in IngestExternalFile() to allow providing checksums using recognized but not necessarily the DB's preferred checksum function, to ease migration between checksum functions.
v10.2.1
10.2.1 (2025-04-24)
Bug Fixes
- Fix improper initialization of ExternalTableOptions
10.2.0 (2025-04-21)
New Features
- Provide histogram stats
COMPACTION_PREFETCH_BYTESto measure number of bytes for RocksDB's prefetching (as opposed to file
system's prefetch) on SST file during compaction read - A new API DB::GetNewestUserDefinedTimestamp is added to return the newest user defined timestamp seen in a column family
- Introduce API
IngestWriteBatchWithIndex()for ingesting updates into DB while bypassing memtable writes. This improves performance when writing a large write batch to the DB. - Add a new CF option
memtable_op_scan_flush_triggerthat triggers a flush of the memtable if an iterator's Seek()/Next() scans over a certain number of invisible entries from the memtable.
Public API Changes
- AdvancedColumnFamilyOptions.max_write_buffer_number_to_maintain is deleted. It's deprecated since introduction of a better option max_write_buffer_size_to_maintain since RocksDB 6.5.0.
- Deprecated API
DB::MaxMemCompactionLevel(). - Deprecated
ReadOptions::ignore_range_deletions. - Deprecated API
experimental::PromoteL0(). - Added arbitrary string map for additional options to be overriden for remote compactions
- The fail_if_options_file_error option in DBOptions has been removed. The behavior now is to always return failure in any API that fails to persist the OPTIONS file.
Behavior Changes
- Make stats
PREFETCH_BYTES_USEFUL,PREFETCH_HITS,PREFETCH_BYTESonly account for prefetching during user initiated scan
Bug Fixes
- Fix a bug in Posix file system that the FSWritableFile created via
FileSystem::ReopenWritableFileinternally does not track the correct file size. - Fix a bug where tail size of remote compaction output is not persisted in primary db's manifest
v10.1.3
10.1.3 (2025-04-09)
Bug Fixes
- Fix a bug where resurrected full_history_ts_low from a previous session that enables UDT is used by this session that disables UDT.
10.1.2 (2025-04-07)
Bug Fixes
- Fix a bug where tail size of remote compaction output is not persisted in primary db's manifest
10.1.0 (2025-03-24)
New Features
- Added a new
DBOptions.calculate_sst_write_lifetime_hint_setsetting that allows to customize which compaction styles SST write lifetime hint calculation is allowed on. Today RocksDB supports only two modeskCompactionStyleLevelandkCompactionStyleUniversal. - Add a new field
num_l0_filesinCompactionJobInfoabout the number of L0 files in the CF right before and after the compaction - Added per-key-placement feature in Remote Compaction
- Implemented API DB::GetPropertiesOfTablesByLevel that retrieves table properties for files in each LSM tree level
Public API Changes
GetAllKeyVersions()now interprets empty slices literally, as valid keys, and uses newOptSlicetype default value for extreme upper and lower range limits.DeleteFilesInRanges()now takesRangeOptwhich is based onOptSlice. The overload takingRangePtris deprecated.- Add an unordered map of name/value pairs, ReadOptions::property_bag, to pass opaque options through to an external table when creating an Iterator.
- Introduced CompactionServiceJobStatus::kAborted to allow handling aborted scenario in Schedule(), Wait() or OnInstallation() APIs in Remote Compactions.
- format_version < 2 in BlockBasedTableOptions is no longer supported for writing new files. Support for reading such files is deprecated and might be removed in the future.
CompressedSecondaryCacheOptions::compress_format_version == 1is also deprecated.
Behavior Changes
ldbnow returns an error if the specified--compression_typeis not supported in the build.- MultiGet with snapshot and ReadOptions::read_tier = kPersistedTier will now read a consistent view across CFs (instead of potentially reading some CF before and some CF after a flush).
- CreateColumnFamily() is no longer allowed on a read-only DB (OpenForReadOnly())
Bug Fixes
- Fixed stats for Tiered Storage with preclude_last_level feature
RocksDB 9.11.2 Release
Rocksdb Change Log
NOTE: Entries for next release do not go here. Follow instructions in
unreleased_history/README.txt
9.11.2 (2025-03-29)
Bump patch version to fix a mistake in the previous 9.11 release tag
9.11.1 (2025-02-19)
New Features
- Added the ability to plug-in a custom table reader implementation. See include/rocksdb/external_table_reader.h for more details.
9.11.0 (2025-01-17)
New Features
- Introduce CancelAwaitingJobs() in CompactionService interface which will allow users to implement cancellation of running remote compactions from the primary instance
- Experimental feature: RocksDB now supports defining secondary indices, which are automatically maintained by the storage engine. Secondary indices provide a new customization point: applications can provide their own by implementing the new
SecondaryIndexinterface. See theSecondaryIndexAPI comments for more details. Note: this feature is currently only available in conjunction with write-committed pessimistic transactions, andMergeis not yet supported. - Provide a new option
track_and_verify_walsto track and verify various information about WAL during WAL recovery. This is intended to be a better replacement totrack_and_verify_wals_in_manifest.
Public API Changes
- Add
io_buffer_sizeto BackupEngineOptions to enable optimal configuration of IO size - Clean up all the references to
random_access_max_buffer_size, related rules and all the clients wrappers. This option has been officially deprecated in 5.4.0. - Add
file_ingestion_nanosandfile_ingestion_blocking_live_writes_nanosin PerfContext to observe file ingestions - Offer new DB::Open and variants that use
std::unique_ptr<DB>*output parameters and deprecate the old versions that useDB**output parameters. - The DB::DeleteFile API is officially deprecated.
Behavior Changes
- For leveled compaction, manual compaction (CompactRange()) will be more strict about keeping compaction size under
max_compaction_bytes. This prevents overly large compactions in some cases (#13306). - Experimental tiering options
preclude_last_level_data_secondsandpreserve_internal_time_secondsare now mutable withSetOptions(). Some changes to handling of these features along with long-lived snapshots and range deletes made this possible.
Bug Fixes
- Fix a longstanding major bug with SetOptions() in which setting changes can be quietly reverted.
RocksDB 10.0.1 Release
10.0.1 (2025-03-05)
Public API Changes
- Add an unordered map of name/value pairs, ReadOptions::property_bag, to pass opaque options through to an external table when creating an Iterator.
- Introduced CompactionServiceJobStatus::kAborted to allow handling aborted scenario in Schedule(), Wait() or OnInstallation() APIs in Remote Compactions.
- Added a column family option disallow_memtable_writes to safely fail any attempts to write to a non-default column family. This can be used for column families that are ingest only.
10.0.0 (2025-02-21)
New Features
- Introduced new
auto_refresh_iterator_with_snapshotopt-in knob that (when enabled) will periodically release obsolete memory and storage resources for as long as the iterator is making progress and its suppliedread_options.snapshotwas initialized with non-nullptr value. - Added the ability to plug-in a custom table reader implementation. See include/rocksdb/external_table_reader.h for more details.
- Experimental feature: RocksDB now supports FAISS inverted file based indices via the secondary indexing framework. Applications can use FAISS secondary indices to automatically quantize embeddings and perform K-nearest-neighbors similarity searches. See
FaissIVFIndexandSecondaryIndexfor more details. Note: the FAISS integration currently requires using the BUCK build. - Add new DB property
num_running_compaction_sorted_runsthat tracks the number of sorted runs being processed by currently running compactions - Experimental feature: added support for simple secondary indices that index the specified column as-is. See
SimpleSecondaryIndexandSecondaryIndexfor more details. - Added new
TransactionDBOptions::txn_commit_bypass_memtable_threshold, which enables optimized transaction commit (seeTransactionOptions::commit_bypass_memtable) when the transaction size exceeds a configured threshold.
Public API Changes
- Updated the query API of the experimental secondary indexing feature by removing the earlier
SecondaryIndex::NewIteratorvirtual and adding aSecondaryIndexIteratorclass that can be utilized by applications to find the primary keys for a given search target. - Added back the ability to leverage the primary key when building secondary index entries. This involved changes to the signatures of
SecondaryIndex::GetSecondary{KeyPrefix,Value}as well as the addition of a new methodSecondaryIndex::FinalizeSecondaryKeyPrefix. See the API comments for more details. - Minimum supported version of ZSTD is now 1.4.0, for code simplification. Obsolete
CompressionTypekZSTDNotFinalCompressionis also removed.
Behavior Changes
VerifyBackupinverify_with_checksum=truemode will now evaluate checksums in parallel. As a result, unlike in case of original implementation, the API won't bail out on a very first corruption / mismatch and instead will iterate over all the backup files logging success / degree_of_failure for each.- Reversed the order of updates to the same key in WriteBatchWithIndex. This means if there are multiple updates to the same key, the most recent update is ordered first. This affects the output of WBWIIterator. When WriteBatchWithIndex is created with
overwrite_key=true, this affects the output only if Merge is used (#13387). - Added support for Merge operations in transactions using option
TransactionOptions::commit_bypass_memtable.
Bug Fixes
- Fixed GetMergeOperands() API in ReadOnlyDB and SecondaryDB
- Fix a bug in
GetMergeOperands()that can return incorrect status (MergeInProgress) and incorrect number of merge operands. This can happen whenGetMergeOperandsOptions::continue_cbis set, both active and immutable memtables have merge operands and the callback stops the look up at the immutable memtable.
RocksDB 9.10.0 Release
9.10.0 (2024-12-12)
New Features
- Introduce
TransactionOptions::commit_bypass_memtableto enable transaction commit to bypass memtable insertions. This can be beneficial for transactions with many operations, as it reduces commit time that is mostly spent on memtable insertion.
Public API Changes
- Deprecated Remote Compaction APIs (StartV2, WaitForCompleteV2) are completely removed from the codebase
Behavior Changes
- DB::KeyMayExist() now follows its function comment, which means
valueparameter can be null, and it will be set only ifvalue_foundis passed in.
Bug Fixes
- Fix the issue where compaction incorrectly drops a key when there is a snapshot with a sequence number of zero.
- Honor ConfigOptions.ignore_unknown_options in ParseStruct()
Performance Improvements
- Enable reuse of file system allocated buffer for synchronous prefetching.
- In buffered IO mode, try to align writes on power of 2 if checksum handoff is not enabled for the file type being written.
RocksDB release 9.9.3
9.9.3 (2024-12-03)
Performance Improvements
- In buffered IO mode, try to align writes on power of 2 if checksum handoff is not enabled for the file type being written.
9.9.2 (2024-11-22)
Bug Fixes
- Honor ConfigOptions.ignore_unknown_options in ParseStruct()
9.9.1 (2024-11-30)
Behavior Changes
- Updates the hidden hook RocksDbThreadYieldAndCheckAbort() to support MySQL to abort long-running query.
9.9.0 (2024-11-18)
New Features
- Multi-Column-Family-Iterator (CoalescingIterator/AttributeGroupIterator) is no longer marked as experimental
- Adds a new table property "rocksdb.newest.key.time" which records the unix timestamp of the newest key. Uses this table property for FIFO TTL and temperature change compaction.
Public API Changes
- Added a new API
Transaction::GetAttributeGroupIteratorthat can be used to create a multi-column-family attribute group iterator over the specified column families, including the data from both the transaction and the underlying database. This API is currently supported for optimistic and write-committed pessimistic transactions. - Added a new API
Transaction::GetCoalescingIteratorthat can be used to create a multi-column-family coalescing iterator over the specified column families, including the data from both the transaction and the underlying database. This API is currently supported for optimistic and write-committed pessimistic transactions.
Behavior Changes
BaseDeltaIteratornow honors the read optionallow_unprepared_value.
Bug Fixes
BaseDeltaIteratornow callsPrepareValueon the base iterator in case it has been created with theallow_unprepared_valueread option set. Earlier, such base iterators could lead to incorrect values being exposed fromBaseDeltaIterator.- Fix a leak of obsolete blob files left open until DB::Close(). This bug was introduced in version 9.4.0.
- Fix missing cases of corruption retry during DB open and read API processing.
- Fix a bug for transaction db with 2pc where an old WAL may be retained longer than needed (#13127).
- Fix leaks of some open SST files (until
DB::Close()) that are written but never become live due to various failures. (We now have a check for such leaks with no outstanding issues.) - Fix a bug for replaying WALs for WriteCommitted transaction DB when its user-defined timestamps setting is toggled on/off between DB sessions.
Performance Improvements
- Fix regression in issue #12038 due to
Options::compaction_readahead_sizegreater thanmax_sectors_kb(i.e, largest I/O size that the OS issues to a block device defined in linux)