-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create broad-leaved red-black tree version of FildeshKV #114
Comments
The tree is only 1/3 smaller in the best case (perfectly balanced, full leaves). The improvement does show up in benchmarks. I need to analyze a bit to be sure that tree rotations aren't breaking it too badly. I have a feeling that we want new rules in the insertion rotations to allow for fusing. Otherwise, leaf nodes will only be doubled up when insertion happens in the right place. |
Turns out that the tree is guaranteeing a 25% compression factor as long as keys are small enough and we don't delete any. This probably comes from the fact that a full tree has 1 more leaf node than inner nodes. If that intuition is accurate, and the savings isn't related to how well balanced the tree is, then it can probably work for deletions too! |
Hmm it doesn't actually guarantee 25% node reduction yet. But I think it could. The proof would come from a structural invariant that all subtrees have a node/entry ratio less than 75%.
This is Presburger arithmetic, so a proof by SMT solver is probably cleanest. Edit: Yep, SMT solver says this is correct. |
Due to how we can actually combine entries into leaves of a binary tree, another part of the structural invariant is that the number of entries is greater than 1 and not equal to 4. Practically, what that means is if we create a single-entry leaf via insertion, rotation, or removal, we have to consider restructuring some neighboring nodes too. In terms of red-black tree, I think the largest neighborhood is the last black node level. It probably makes sense to special-case a bunch:
To maintain that structure, our adds and removes need to shift a few entries around, but those are also the cases where we don't have to bubble up & rotate, so it's not that bad. These special cases have some good properties:
|
We can think of those in terms of the subtree halves:
Dealing with those 3 elementary subtrees seems easiest. Just remember that we have to coordinate with the sibling subtree for lots of operations. |
Actually, we can just deal with the first 2 elementary subtrees. Excluding the 3rd restricts us to the {5,6,7}-entry cases above, which keeps all the nice properties and avoids the {8,9,10,11}-entry special cases. Seems like less shuffling too. |
Insert rule:
Deletion rule:
Leaves that are too big to merge can follow the same rules.
Elementary subtree operations:
|
Only the SINGLE_LIST kind of FildeshKV cares at the moment, but oversized keys will be very important for testing "broadleaf" trees. This commit includes some other changes related to broadleaf trees without including a broadleaf tree implementation itself. Issue #114
Oh boy. The insertion cases are getting a little wild. To keep some amount of sanity, I'm enumerating them as separate functions. Other than node creation for the 7-entry -> 8-entry transformation, a dispatch function can just call the following special cases to do the hard part. Well... I guess it's pretty hard to call the right function too. /*
* Root cases are only for 1 entry.
* insert_0_root_sblack(): Standard. Might become oversized.
* insert_1_root_sblack(): Standard. Might become oversized.
*
* 2-entry subtree.
* insert_0_d2_bblack(): Standard. Left.
* insert_0_d2_sblack(): Oversized. Left.
* insert_1_d2_bblack(): Standard. Middle.
* insert_1_d2_sred(): Oversized. Middle.
* insert_2_d2_bblack(): Standard. Right. May become oversized.
* insert_2_d2_sred(): Oversized. Right.
*
* rshins_d2(): Insert on left. Shift to pop right.
* lshins_d2(): Insert on right. Shift to pop left.
*
* 3-entry subtree.
* lshins_0_d3_sblack(): Standard. Left. No-op.
* lshins_0_d3_sred(): Oversized. Left. No-op.
* rshins_0_d3_sblack(): Standard. Left. Shift everything.
* rshins_0_d3_sred(): Oversized. Left. Shift everything.
* lshins_1_d3_rred(): Standard. LMiddle.
* lshins_1_d3_sred(): Oversized. LMiddle.
* rshins_1_d3_rred(): Standard. LMiddle.
* rshins_1_d3_sred(): Oversized. LMiddle.
* lshins_2_d3_rred(): Standard. RMiddle.
* lshins_2_d3_sred(): Oversized. RMiddle.
* rshins_2_d3_rred(): Standard. RMiddle.
* rshins_2_d3_sred(): Oversized. RMiddle.
* lshins_3_d3_rred(): Standard. Right. May become oversized. Shift everything.
* lshins_3_d3_sred(): Oversized. Right. Shift everything.
* rshins_3_d3_rred(): Standard. Right. No-op.
* rshins_3_d3_sred(): Oversized. Right. No-op.
*
* lsh_d3(): Create a d2 by shifting to pop left.
* rsh_d3(): Create a d2 by shifting to pop right.
*/ |
The remove operation will come later. It took a lot to get insert working. Issue #114
See comments in src/lib/kv/brbtree.c for the final description of how insertion works. The layout I described in #114 (comment) is mostly accurate, but there are a few more special cases for small trees. As for removal, I guess it'll just happen in reverse and leverage the generic red-black tree rebalancing operations. It'll be roughly the same amount of complexity, but should be easier to implement because we already have a model for how all these subtrees should work. However, the implementation is slightly slower than normal red-black trees, so I'm pushing this to a later release where it can be paired with a hash table where space-savings matters more. |
The "broadleaf" tree can contain 2 elements in its leaf nodes. The keys have to be small enough for us to pack both into a single
size_t
. The actual 2nd key and value will occupy the left and right child node pointers, which leaves don't use by definition.The text was updated successfully, but these errors were encountered: