The Road to 1.0

# The Road to 1.0

by @haadcode and @aphelionz

Big things are coming to OrbitDB. In this document, we describe our proposal for getting OrbitDB from alpha to 1.0.

As always, our roadmap is a combination of the long-term vision for OrbitDB, features the users have been asking for, issues the community wants to address, the core developers have separately discussed, or have been encountered as bugs in using OrbitDB. Please note that inclusion in the road map is not a promise of delivery!

Things here are subject to alteration or deletion. As of this writing, these should be considered proposals and open conversations and anybody should feel welcome and able to provide feedback in the form of questions, comments, or suggestions.

As always, we welcome contributions from the community and would be happy to help to land any of the discussed features or fixes.

If you would like to financially support OrbitDB, we now have an [OpenCollective](https://opencollective.com/orbitdb) that we request you contribute to. Anything helps and we are forever grateful for your support, monetary or otherwise.

In general, the feaqtures and improvements proposed revolve around three categories: Performance and Resource Consumption, User Experience, and Encryption. Without further ado, let's look at the specific items.

## Checklist

- [Non-breaking changes](#non-breaking-changes)
    - [x] [Replicator Refactoring](#replicator-refactoring)
    - [x] [BTree Indexing for KVStore and DocStore](#btree-indexing-for-kvstore-and-docstore)
    - [ ] [Snapshots](#snapshots)
    - [ ] [User Experience Improvements](#user-experience-improvements)
    - [ ] [Developer Experience: The Publish Dance](#developer-experience-the-publish-dance)
    - [ ] [Community efforts](#community-efforts)
- [Breaking Changes](#breaking-changes)
    - [ ] [Oplog Watermarks](#oplog-watermarks)
    - [ ] [Database Encryption](#database-encryption)
    - [ ] [Streaming / Async Iterators](#streaming-async-iterators)
    - [ ] [Hot / Cold Data Separation](#hot-cold-data-separation)
    - [ ] [Misc. Cleanup](#misc-cleanup)
- [ ] [Potential Rust integration](#potential-rust-integration)

## Non-breaking changes

The changes in this section should be able to be implemented without breaking any backwards compatibility or public-facing APIs. Though certain application-level details might change and need to be addressed, by and large these changes should not require a new major version.

### Replicator Refactoring

> Use Case: I have a database that has been replicated locally. I want to get the current state of the db as fast as possible when opening the db (in order to return the first query as fast as possible).

As of right now, the store replicator uses the `next` field in a log entry to replicate, whereas it could use the new `refs` field, as loading now does.

This is by and large the most effective improvement we can make, and perhaps the most often requested and discussed in the community.

There are other possible ways to address the initial query and loading performance that we may want to take up on.

Further discussion: https://github.com/orbitdb/SCPs/pull/3
Work is happening here: https://github.com/orbitdb/orbit-db-store/pull/100

### BTree Indexing for KVStore and DocStore

As it stands, all keys from the database index are kept in memory. This works well for most cases, but becomes a limit once you get to the order of 1M keys or more. This can be minimized with the use of B-Trees.

@vasa-develop and @vaultec81 utilized this technique in [AvionDB](https://github.com/orbitdb/orbit-db-docstore/issues/38).

This is highly connected to ["Hot/Cold Data Separation (in-memory vs. on-disk data)"](#hot-cold-data-separation).

### Snapshots

> I have a database that has been replicated locally. I want to get the current state of the db as fast as possible when opening the db (in order to return the first query as fast as possible).

A snapshot is the current state of the database, ie. only the current data without the database oplog (history). The snapshot of the current state could potentially be a log db itself.

### User Experience Improvements

A collection of "small" items that would improve UX for the OrbitDB user.

- Add a "merge fields" option to `DocStore.put` to merge the fields of the current doc and updated doc
- Remove the need for a database name and just use the CID as the address. Move everything else to the manifest (which already contains the name of the database).
- Remove the need for separate load() (but keep it available) and provide a one-liner to start, eg. `OrbitDB.open(<address>)` performs the instantiation of the orbitdb object, opening of the database and what currently happens in `load()`.

### Developer Experience: The Publish Dance

One of the biggest hurdles to releases is a term the contributors call the "publish dance" which requires a coordinated effort of publishing around 20 different npm modules that together constitute an OrbitDB release. There's no need to enumerate them here but the process generally starts from `ipfs-log` and moves upward to the top-level `orbit-db`.

The community has discussed about solving this on tooling level, such as using Lerna for module management, but a better alternative would be to address this on the architecture / implementation level by:

- remove the inheritance of stores and inject the Store module to stores
- remove ipfs-log dependency from Store and inject it from OrbitDB
- generally switch all inheritance to dependency injection (eg. feedstore *takes in as a parameter* an eventstore instead of inheriting from it)

All these would make it possible to configure the dependencies on the main package level, in `orbit-db`, giving the users more flexibility in choosing which modules and versions they use.

### Community efforts

There are a number of community efforts that we'd like to focus our attention towards getting merged, for two reasons. First, we value our community's input and want to further streamline their contributions and second, we want to make sure they are merged before the breaking changes int he next section.

See the [GitHub project] for more info.

[GitHub project]: https://github.com/orgs/orbitdb/projects/3

## Breaking changes

Ok, on to the main event.

Given the scale and impact of these changes, backwards compatibility may be abandoned and we would make a new major version to signal the breaking changes.

### Oplog Watermarks

As of right now, the kvstore and docstore currently reduce the full log any time `updateIndex` is called, which is on every write to the oplog. This is slow, grows even slower over time and ultimately unnecessary.

A solution to this could be to add high/low watermark and only process "new oplog entries". This is highly applicable especially for KVStore and DocStore.

### Database Encryption

It should be possible to encrypt the payload of an OrbitDB. We've been pushing this back in the past because maturity of the technologies used was not there yet and we wanted to give the user flexibility. Admittedly, we now want to take the onus of implication by "suggesting" a default encryption scheme. However, it's become increasingly apparent that something like this is necessary.

This is another change that touches and effects everything in the architecture as well as data formats, and this could be considered the beginning of the discussion.

- How many keys and for what are they used to encrypt (oplog entries vs. payloads)?
- Where are they stored?
- How does this effect AccessControllers?
- How does this tie into hot/cold data (see below)?

Many projects have rolled their own solutions, e.g. TallyLab using the `nacl-js` library, the proposal for [`dag-jose`](https://github.com/ipld/specs/pull/269) by @oed, and so on, so there are places to seek inspiration.

### Async Iterators / Streaming

When applicable we should be using async iterators / generators (or streams) to process data and then "discard" it, allowing more real-time capabilities and the ability to return results as they become available, instead of waiting for the full log to be fetched or processed.

### Hot/Cold Data Separation (in-memory vs. on-disk data)

Currently the entire database (log entries and the computed state) are loaded into memory in its entirety before use. This takes time to load and uses more memory. However, this is another massive change that effects every other part of the system.

This will also positively effect perceived performance and user experience: Entries would load fairly instantly, and reasoning about the state of the db replication becomes easier.

The general idea here is to 1) read and compute database state on-demand (ie. upon query) 2) cache "warm" data (=data that is most likely to be used soon, or was recently used) in order to have a configurable in-memory cache 3) fallback to reading from disk when the cache doesn't have the data available.

### Misc. Cleanup

Some more items that are smaller in size / complexity.

- clean up all events and their semantics (eg. only one "updated" event instead of "write" and "replicated"), perhaps remove some and only use callback (eg. "onLoadProgressCallback").
- separate identities/keys from oplog entry. CID per identity/key. cuts N bytes from each pubsub message and bitswap/ipfs/ild block transfer.
- kvstore: keep only the keys in memory, make them point to the CID with the data and fetch data on query from cache/ipfs.

## Potential Rust Integration

We also, during the course of this work, want to explore integrating Rust into the project, in two potential places:
1. Specific places that can benefit from wasm performance, likely things like crypto verification and maybe CRDT calculation
2. Implementing pieces of OrbitDB as separate Rust integrations, to be used with Rust project like [Rust IPFS](https://github.com//rs-ipfs/rust-ipfs)
 
## Conclusion

These are our plans for 2020 onward. With these features and changes implemented, we believe OrbitDB would be on par with our vision for it as well as the user needs, and would make an excellent version 1.0.

Let us know what you think, and again, if you find any of this valuable and want to help, the best way is via the [OrbitDB Open Source Community](https://github.com/orbitdb) or the [OpenCollective](https://opencollective.com/orbitdb).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Road to 1.0 #819

The Road to 1.0

Checklist

Non-breaking changes

Replicator Refactoring

BTree Indexing for KVStore and DocStore

Snapshots

User Experience Improvements

Developer Experience: The Publish Dance

Community efforts

Breaking changes

Oplog Watermarks

Database Encryption

Async Iterators / Streaming

Hot/Cold Data Separation (in-memory vs. on-disk data)

Misc. Cleanup

Potential Rust Integration

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

The Road to 1.0 #819

Description

The Road to 1.0

Checklist

Non-breaking changes

Replicator Refactoring

BTree Indexing for KVStore and DocStore

Snapshots

User Experience Improvements

Developer Experience: The Publish Dance

Community efforts

Breaking changes

Oplog Watermarks

Database Encryption

Async Iterators / Streaming

Hot/Cold Data Separation (in-memory vs. on-disk data)

Misc. Cleanup

Potential Rust Integration

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions