Problem
OrbitDB's append-only oplog has no mechanism to expire or compact entries whose referenced blocks are permanently unavailable. Once an orphaned entry enters the oplog (e.g. from a corrupt write, a blockstore wipe, or the helia v6 streaming blockstore bug), it replicates to every peer forever.
Every peer that receives the entry attempts to load the referenced identity block via bitswap, fails (because no peer has it), and retries on every sync cycle — indefinitely. This fills logs with LoadBlockFailedError / Want was aborted errors and wastes network resources.
How orphaned entries get created
-
Helia v6 streaming blockstore incompatibility — helia v6 changed blockstore.get() to return AsyncGenerator<Uint8Array> instead of Promise<Uint8Array>. OrbitDB v3.0.2 expects the old API. When the streaming response is consumed incorrectly, identity blocks get written with garbled bytes. The CID is valid but the content doesn't match. These entries then replicate to all peers.
-
Disk-full or I/O error during write — if the blockstore write fails partway through (disk full, I/O error), the block may be partially written. A subsequent integrity check or restart detects the corruption and removes the block locally, but the oplog entry referencing it has already been replicated to peers.
-
Blockstore wipe after integrity check — the application detects corrupt blocks on startup and wipes the blockstore to recover. The oplog entries that referenced those blocks now reference CIDs that no longer exist anywhere on the network.
In all cases, the oplog entry is valid CBOR and has a valid structure — it just references a block (typically an identity block) that doesn't exist on any peer. Since the oplog is append-only with no expiry, these poison entries persist forever.
Current impact
We run OrbitDB as part of a distributed environmental sensor network. We're currently in testing with 3-4 peers but expect this to grow to many thousands or more. The current testing databases are small (node registry, trust list — ~10 entries total across 3 databases). Despite this, we see dozens of LoadBlockFailedError messages on every peer after each restart, and they continue on a 15-minute retry cycle indefinitely.
We've implemented application-level workarounds:
- A permanent block blacklist (after N failed fetches, stop retrying that CID forever, persist to disk)
- A
canAppend patch that accepts entries with unverifiable identities (since we use write: ["*"])
- Write-ahead verification on
put() to catch partial writes before they create new orphaned references
These suppress the symptoms but don't fix the root cause — the entries still replicate between peers, consuming bandwidth and triggering the fetch-fail-blacklist cycle on every new peer that joins. At scale, every new peer joining the network will have to discover and blacklist every orphaned entry independently.
Proposed solutions
Any of these would help:
-
Oplog entry TTL / expiry — allow entries older than a configurable age to be dropped during sync. For many use cases (node registries, state tracking), only recent entries matter.
-
Oplog compaction — a mechanism to compact the oplog by removing entries whose referenced blocks are known to be unavailable (e.g. after N failed fetch attempts across all peers).
-
Head-only sync mode — for databases that only care about current state (key-value stores), sync only the current heads and their direct dependencies rather than the full oplog history.
-
Entry validation during sync receive — before accepting an entry from a peer, verify that its referenced blocks (identity, payload) are either available locally or fetchable. Reject entries that reference unreachable blocks rather than accepting them into the local oplog.
Environment
- OrbitDB: v3.0.2
- Helia: v6
- Node.js: 22
- Databases: keyvalue (3 databases, ~10 entries total)
- Peers: 3-4 nodes (testing, expected to scale to thousands+)
- Access control:
write: ["*"] (permissive)
Related issues
Problem
OrbitDB's append-only oplog has no mechanism to expire or compact entries whose referenced blocks are permanently unavailable. Once an orphaned entry enters the oplog (e.g. from a corrupt write, a blockstore wipe, or the helia v6 streaming blockstore bug), it replicates to every peer forever.
Every peer that receives the entry attempts to load the referenced identity block via bitswap, fails (because no peer has it), and retries on every sync cycle — indefinitely. This fills logs with
LoadBlockFailedError/Want was abortederrors and wastes network resources.How orphaned entries get created
Helia v6 streaming blockstore incompatibility — helia v6 changed
blockstore.get()to returnAsyncGenerator<Uint8Array>instead ofPromise<Uint8Array>. OrbitDB v3.0.2 expects the old API. When the streaming response is consumed incorrectly, identity blocks get written with garbled bytes. The CID is valid but the content doesn't match. These entries then replicate to all peers.Disk-full or I/O error during write — if the blockstore write fails partway through (disk full, I/O error), the block may be partially written. A subsequent integrity check or restart detects the corruption and removes the block locally, but the oplog entry referencing it has already been replicated to peers.
Blockstore wipe after integrity check — the application detects corrupt blocks on startup and wipes the blockstore to recover. The oplog entries that referenced those blocks now reference CIDs that no longer exist anywhere on the network.
In all cases, the oplog entry is valid CBOR and has a valid structure — it just references a block (typically an identity block) that doesn't exist on any peer. Since the oplog is append-only with no expiry, these poison entries persist forever.
Current impact
We run OrbitDB as part of a distributed environmental sensor network. We're currently in testing with 3-4 peers but expect this to grow to many thousands or more. The current testing databases are small (node registry, trust list — ~10 entries total across 3 databases). Despite this, we see dozens of
LoadBlockFailedErrormessages on every peer after each restart, and they continue on a 15-minute retry cycle indefinitely.We've implemented application-level workarounds:
canAppendpatch that accepts entries with unverifiable identities (since we usewrite: ["*"])put()to catch partial writes before they create new orphaned referencesThese suppress the symptoms but don't fix the root cause — the entries still replicate between peers, consuming bandwidth and triggering the fetch-fail-blacklist cycle on every new peer that joins. At scale, every new peer joining the network will have to discover and blacklist every orphaned entry independently.
Proposed solutions
Any of these would help:
Oplog entry TTL / expiry — allow entries older than a configurable age to be dropped during sync. For many use cases (node registries, state tracking), only recent entries matter.
Oplog compaction — a mechanism to compact the oplog by removing entries whose referenced blocks are known to be unavailable (e.g. after N failed fetch attempts across all peers).
Head-only sync mode — for databases that only care about current state (key-value stores), sync only the current heads and their direct dependencies rather than the full oplog history.
Entry validation during sync receive — before accepting an entry from a peer, verify that its referenced blocks (identity, payload) are either available locally or fetchable. Reject entries that reference unreachable blocks rather than accepting them into the local oplog.
Environment
write: ["*"](permissive)Related issues