Immutable key-value database with main focus on taking diffs belween versions of data (we call them generations) and to be able to transform one set of key-value items (collection) to another set and preserve consistency even in cases of system failures.
- Collection — lexicographically ordered set of key-value pairs
- CollectionName —
1to255bytes that are represent UTF-8 string - CurrentGenerationId — zero to
255bytes - NextGenerationId —
NULLor1to255bytes, must be bigger than CurrentGenerationId. Can beNULLin manual collections where no next generation is planned yet. - Manual collection — collection allows puts only with specified
generationIdthat previosly was initiated withgeneration/start - Non-manual collection — collection that allows puts without
generationIdand automatically commits NextGenerationId to CurrentGenerationId after some amount of puts/time elapsed
- CollectionName —
- Record — item of collection, consists of:
- Key — zero to
2^24-1bytes - Value —
TOMBSTONEor zero toout of memory exceptionbytes (but in fact this is limited by http methods request body size, for example there is 32 megabytes forputMany).TOMBSTONEmeans that record with suchkeywas deleted - GenerationId — zero to
255bytes. Any query always uses somegenerationId(latest for the collection or provided by user), if record'sgenerationIdis bigger thangenerationIdfor the query's, this record is invisible. Only one record with the maximally closegenerationIdto the query'sgenerationIdis visible - PhantomId —
1to255bytes. Records withphantomIdare visible only for queries with the samephantomId
- Key — zero to
- Reader — item of collection, consists of:
- ReaderName — non-empty UTF-8 string (TODO: add limit, issue #10)
- CollectionName — optional CollectionName, if not specified — it means current collection
- GenerationId — CurrentGenerationId, marker to some generation in foreign collection (specified by CollectionName). It prevents garbage collection of generations of target collection and may be used for
diffcalls asfromGenerationIdsource
Old generations in collection X are removed only if exists at least one reader that points to collection X.
WARN: you MUST NOT query/diff collection for generationId < minimumGenerationId or create/update readers with generationId < minimumGenerationId. Currently, this is undefined behavior, later there will be special error if you will try.
TODO.
Currently, you can see this example, or you can try to dive into https://github.com/anfivewer/an5wer/blob/d680fc113447bbf2c03b6ea050769b2ffcab9b5c/packages-sesuritu/logs-processing/src/main.ts#L63 .
This is version zero (or maybe even -1), it will change dramatically, since it is very inconsistent/encodings was added after first planning, some methods are plain POST requests because it was easier to implement in the start of my first touches of hyper http lib.
Input/output parameters types are described in TypeScript-like type definitions.
type Requestis definition of JSON of the body of a requesttype Responseis definition of JSON of the body of a responsetype QueryParamsis a set of params that are can be specified as query params (with comment of how it maps from a string)- Rest types are helpers and can be reused between methods
// default value is 'utf8'
type Encoding = 'utf8' | 'base64';
type EncodedString = {
value: string;
encoding?: Encoding;
};
type KeyValue = {
key: EncodedString;
value: EncodedString;
};
type KeyValueUpdate = {
key: EncodedString;
ifNotPresent?: boolean;
value: EncodedString | null;
};type Response = {
items: {
name: string;
isManual: boolean;
}[];
};
Returns list of all collections.
type Request = {
collectionName: string;
encoding?: Encoding;
} &
(
{
isManual: false
}
| {
isManual: true,
initialGenerationId: EncodedString;
}
);
type Response = {
generationId: EncodedString;
};
Creates collection. For manual collections you can specify initialGenerationId: "" (empty string).
type QueryParams = {
// as string, separated by comma
fields?: ('generationId' | 'nextGenerationId')[],
};
type Response = {
isManual: boolean;
generationId?: EncodedString;
nextGenerationId?: EncodedString;
};
By default, all fields are returned, but you can specify only needed:
GET /collections/log-lines?fields=generationId,nextGenerationId
{
"isManual": false,
"generationId": {"value": "AAAAAAAACm4=", "encoding": "base64"},
"nextGenerationId": {"value": "AAAAAAAACm8=", "encoding": "base64"}
}
Deletes the collection. Warning: this will delete it with all files immediately. In the future I plan to just move it and delete in a week or something like that to be able to recover it if it was unattended action.
Deletion of associated readers is not implemented yet. Issue #2.
type QueryParams = {
generationId?: string;
generationIdEncoding?: Encoding;
};
type Response = {
generationId: EncodedString;
};
If generationId query param is not set, responds immediately with collection current generationId. If param is set, responds with updated generationId or with the same one if 60 seconds passed.
This long-polling is useful to wait for commits of non-manual collection, or wait for changes to run diff on some collection.
type Request = {
key: EncodedString;
generationId?: EncodedString;
phantomId?: EncodedString;
};
type Response = {
generationId: EncodedString;
item: KeyValue | null;
};
type Request = {
key: EncodedString;
requireKeyExistance: boolean;
generationId?: EncodedString;
phantomId?: EncodedString;
};
type Response = {
generationId: EncodedString,
left: EncodedString[];
right: EncodedString[];
hasMoreOnTheLeft: boolean;
hasMoreOnTheRight: boolean;
foundKey: boolean;
};
Beware, Response['left'] is in reversed keys order. For example, if you are requesting keys around 4, left will contain [{"key": "3"}, {"key": "2"}, {"key": "1"}].
requireKeyExistance: false case is not implemented yet. Issue #3.
type Request = {
item: KeyValueUpdate;
generationId?: EncodedString;
phantomId?: EncodedString;
};
type Response = {
generationId: EncodedString;
wasPut?: boolean;
};
Writes single key-value entry (or deletes it if value: null). For manual collections generationId is required and must be equal to started generation (except if phantomId used, then generationId can have any value in the past or in the future).
If ifNotPresent: true, then if key already exists, its value will not be overwritten and generationId of this key will not be updated. wasPut will indicate, was value updated or not.
Warning: without ifNotPresent key-value record will be updated even if it has the same value. For example if you have {"key":"a", "value":"42", "generationId":"001"} stored in the database and next generationId is 002, if you'll /put {"key":"a", "value":"42"}, new record {"key":"a", "value":"42", "generationId":"002"} will be created. Vote for issue #1.
type Request = {
items: KeyValueUpdate[];
generationId?: EncodedString;
phantomId?: EncodedString;
};
type Response = {
generationId: EncodedString;
};
type Response = {
items: {
readerName: string;
collectionName?: string;
generationId: EncodedString;
}[];
};
type Request = {
readerName: string;
collectionName?: string;
generationId?: EncodedString | null;
};
type Response = {};
type Response = {};
type Request = {
generationId?: EncodedString | null;
};
type Response = {};
Request parameters are broken, see issue #5.
type Request = {
toGenerationId?: EncodedString;
} & (
{
fromGenerationId: EncodedString;
}
| {
fromReader: {
readerName: string;
collectionName?: string;
};
}
);
type KeyValueDiff = {
key: EncodedString;
fromValue: EncodedString | null,
intermediateValues: (EncodedString | null)[];
toValue: EncodedString | null,
};
type DiffResponse = {
fromGenerationId: EncodedString,
toGenerationId: EncodedString;
items: KeyValueDiff[];
cursorId?: string;
};
type Response = DiffResponse
There is two ways to specify fromGenerationId:
- Manually by providing
fromGenerationId - By providing
fromReader. If specified, diff will readreaderNamefrom collectioncollectionName, take itsgenerationId
Response can have generationId that is less or equal to toGenerationId (if it is specified, or to current generationId). You should repeat diff requests until it will respond with fromGenerationId == generationId.
intermediateValues currently always is an empty array. Later there will be omitIntermediateValues: false option that will provide those values. See issue #6.
type Response = DiffResponse;
If diff/start responded with cursorId you should call this method to get the rest of output.
Abort diff.
type Request = {
generationId?: EncodedString;
phantomId?: EncodedString;
};
type QueryResponse = {
generationId: EncodedString;
items: KeyValue[];
cursorId?: string;
};
type Response = QueryResponse
Reads all key-value records from collection. If generationId is specified, items that was added/updated/deleted after this generation will be omitted from the result.
type Request = {
cursorId: string;
};
type Response = DiffResponse;
If query/start responded with cursorId you should call this method to get the rest of output.
Aborts query.
type Request = {};
type Response = {
phantomId: EncodedString;
};
Gets phantomId that can be used for puts. They are useful to create "fake modifications" of some collection in the past. Records with phantomId is visible only for query/getKeysAround with specified phantomId (and only for equal phantomId).
Phantoms are relatively short-living entity. Currently, their TTL is not specified, but in next revisions I maybe will remove this method and will bind phantoms to generations (when you start generation you can create phantoms in some collections, then after commit phantoms are gone).
type Request = {
generationId: EncodedString;
abortOutdated?: boolean;
};
type Response = {};
Works only on manual collections.
If abortOutdated specified and there is generation that is already started and its generationId is less than provided, all records that was added in this generation will be deleted.
type Request = {
generationId: EncodedString;
};
type Response = {};
Aborts generation, deletes all records that was put in this generation.
type Request = {
generationId: EncodedString;
updateReaders?: {
readerName: string;
generationId: EncodedString;
}[];
};
type Response = {};
Commits generation (makes new records visible), atomically with readers updates.
For example, you need to transform collections A and B to collection C. Initialization:
- Create manual collection
CwithgenerationId: {value: "AAAAAAAAAAA=", "encoding": "base64"}(64 zero bits) - Create reader in collection
C:{"readerName": "from_a", "collectionName": "A", "generationId": {value:""}} - Create reader in collection
C:{"readerName": "from_b", "collectionName": "B", "generationId": {value:""}}
Transform iteration:
- Get current&next
Cgeneration ids - Get next generationId if it is present, if not — take current
- Increment it (from
AAAAAAAAAAA=it will becomeAAAAAAAAAAE=, thenAAAAAAAAAAI=and so on), start generation with incrementedgenerationIdandabortOutdated: true, we'll call this generation id ascommitGenerationId - Execute diff on collection
AwithreaderName: 'from_a', readerCollectionName: 'C', remembergenerationIdof diff result asaGenerationId - Execute diff on collection
BwithreaderName: 'from_b', readerCollectionName: 'C', remembergenerationIdof diff result asbGenerationId - Process diff, make puts to collection
C(generationIdshould becommitGenerationId); you can also make gets withcommitGenerationIdto see what you are already stored to some key to update it, if you got new data fromAorB - Commit generation
commitGenerationId, pass:updateReaders: [ { readerName: 'from_a', generationId: aGenerationId }, { readerName: 'from_b', generationId: bGenerationId }, ]
If you got any error on steps above — abort generation and try again/investigate your code.
Repeat transform iteration until readers from_a and from_b will not be equal to A and B generation ids correspondingly. Then you can watch for A and B generation ids, wait for their updates and repeat the process.
Install flatc of flatbuffers with same version as diffbelt_protos have.
$ rustup target add wasm32-unknown-unknown
$ cd crates/diffbelt_example_wasm && make
$ cargo test