Skip to content

Conversation

@SuperFluffy
Copy link
Contributor

@SuperFluffy SuperFluffy commented Oct 30, 2023

Summary

Implement a RFC 6962 compliant merkle tree with flat memory representation, remove ct-merkle.

Background

We want to use a merkle tree that minimizes allocations because constructing merkle trees is a hot operation that happens every time new blocks are generated or verified. Worse, for blocks containing transactions for several distinct rollups a merkle Tree is created per rollup. Finally, Astria wants a minimal API that is tailored exactly to its needs to make audits as simple as possible.

The implementation of this merkle tree library does not allocate space for its leaves but only stores leaf and branch hashes in a flat, one dimensional array. Furthermore, it only supports sha256 hashes. For background reading refer to the following links:

Changes

  • Add an astria-merkle crate implementing a flat merkle tree.
  • Rewrite all services/crates in terms of it.
  • Simplify several utility functions by implementing APIs on the merkle tree, avoiding allocations.

Testing

  • Ensure the merkle tree passes all tests with known roots defined by https://github.com/transparency-dev/merkle
  • Add unit and doc tests for all high level APIs.
  • Ensure that all snapshot tests in astria sequencer still pass without changing the expdcted hashes

Breaking Changelist

  • Sequencer block data and sequencer/rollup blobs are changed, breaking compatibility with prior versions of sequencer, sequencer-relayer, conductor.

@github-actions github-actions bot added sequencer pertaining to the astria-sequencer crate conductor pertaining to the astria-conductor crate sequencer-relayer pertaining to the astria-sequencer-relayer crate labels Nov 1, 2023
@SuperFluffy SuperFluffy changed the title WIP: astria flat merkle tree Implement an RFC 6962 Merkle tree with flat memory representation Nov 3, 2023
@SuperFluffy SuperFluffy requested a review from a team November 3, 2023 21:07
@SuperFluffy SuperFluffy marked this pull request as ready for review November 3, 2023 21:07
@SuperFluffy SuperFluffy force-pushed the superfluffy/flat-merkle branch from 0935104 to 52f3eeb Compare November 6, 2023 11:03
@SuperFluffy SuperFluffy changed the title Implement an RFC 6962 Merkle tree with flat memory representation feat: add an RFC 6962 compliant Merkle tree with flat memory representation Nov 6, 2023
Comment on lines +523 to +485
pub fn from_leaves<I, B>(iter: I) -> Self
where
I: IntoIterator<Item = B>,
B: AsRef<[u8]>,
{
let mut tree = Self::new();
for item in iter {
tree.push(item.as_ref());
}
tree
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since push updates the tree every time it's called, it would be nice to have a from_leaves function that doesn't call push, but just constructs the tree directly from all the leaves. this way, we can do 1 bigger allocation at once since we know the final tree size, and don't have to hash intermediates unnecessarily. thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have that, but to really benefit from this we would need to walk the tree level by level and bottom to top, and only update branches at each level.

But the math comes out a bit annoying and is IMO out of scope for this PR. I would want to spend some time thinking about how to do that.

A thought: according the indexing scheme the trailing set bits indicate the depth of the subtree. So the root for a tree of depth D has a binary representation of 011...1 with D-1 ones. This extends to every subtree. This means we should in principle be able to walk the tree by "roots-of-subtrees" by fixing the number of trailing 1s for each level.

@SuperFluffy SuperFluffy force-pushed the superfluffy/flat-merkle branch from fb7db52 to 7556b7b Compare November 7, 2023 14:57
Copy link
Contributor

@noot noot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor comments, approving since overall looks good and don't want to block. great work!!

@SuperFluffy SuperFluffy requested review from a team, itamarreif and sambukowski November 15, 2023 16:52
Copy link
Contributor

@itamarreif itamarreif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. Only note would be that we should eventually write a from_leaves() method that's more efficient with the memory allocations, as @noot mentioned. But this could easily be added later.

Block data construction seems solid and largely matches our spec as described in data flow and verification. Only thing that should be updated there is the types: we no longer have InclusionProof, instead we have merkle::Proof.

Another note is that we should probably also add a document in the specs directory for how the Merkle tree implementation works. This information largely lives as docstrings in the astria-merkle crate in this PR which is also fine, though.

@SuperFluffy SuperFluffy force-pushed the superfluffy/flat-merkle branch from aff5474 to 5796f0e Compare November 17, 2023 10:42
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 17, 2023
@SuperFluffy
Copy link
Contributor Author

I have amended the spec to be in line with this patch, and created #587 to track for a more efficient from_leaves implementation, and #588 to track writing a spec doc.

@SuperFluffy SuperFluffy merged commit 5d9ab9a into main Nov 17, 2023
@SuperFluffy SuperFluffy deleted the superfluffy/flat-merkle branch November 17, 2023 10:55
steezeburger added a commit that referenced this pull request Nov 17, 2023
* main:
  feat: add an RFC 6962 compliant Merkle tree with flat memory representation (#554)
SuperFluffy added a commit that referenced this pull request Nov 21, 2023
## Summary
Enforce invariants when constructing sequencer blobs.

## Background
The sequencer blob type `SequencerNamespaceData` that was written to and
read from celestia was treated as a dumb container, delegating
consistency checks to downstream users (for example, to the
`block_verifier` module in `astria-conductor`). In the spirit of making
invalid states unrepresentable, this patch introdues a
`RawSequencerNamespaceData` as the dumb (de)serialization container,
from which a verified `SequencerNamespaceData` can be built. In
particular, the following invariants are upheld at construction,
rejecting a conversion otherwise (`SMD` stands for
`SequencerNamespaceData` and `SMD.field` refers to the `field` of the
same):

1. `SMD.header.data_hash` must be set
2. `SMD.block_hash` must match the result of `SMD.header.hash()`
3. `SMD.action_tree_root` and `SMD.action_tree_root_inclusion_proof`
must verify against `SMD.data_hash`
4. The Merkle tree hash built from `SMD.rollup_chain_ids` must match
`SMD.chain_ids_commitment`
5. The `SMD.chain_ids_commitment` and `SMD.chain_ids_commitment_proof`
must verifiy against `SMD.data_hash`

With these changes conductor's `BlockVerifier` can be simplified
significantly and is now only responsible for checking the validity of a
given sequencer blob against the remote sequencer.

## Changes
- Introduce `RawSequencerNamespaceData` type
- Make all `SequencerNamespaceData` fields private, add read-only
getters
- Change `SequencerNamespaceData` serialization to be in terms of the
`Raw*` type
- Enforce sequencer blob invariants by introducing
`SequencerNamespaceData::try_from_raw` as the only fallible constructor
- Add a `SMD::data_hash` getter for convenience which panics if the
invariants are violated
- Streamline `BlockVerifier` by removing all checks that now happen
during `SequencerNamespaceData` construction

## Testing
How are these changes tested?

## Related Issues
This is done as part of #356 

Blocked on #554 being merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conductor pertaining to the astria-conductor crate documentation Improvements or additions to documentation sequencer pertaining to the astria-sequencer crate sequencer-relayer pertaining to the astria-sequencer-relayer crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants