A simple high performance bank application using command sourcing.
-
Process around
200,000write-requests per second on a singleleadernode.Result of sending 500k write-requests (deposit only) to the
leaderwith 64 grpc connections (running on a MacBook Pro, 13-inch, M1, 16 GB, 2020): -
By adding more the
followernodes, thereadthroughput can increase linearly, theoretically reachinginfinity.
NOTE: This project is slated for significant performance enhancements through the implementation of Cap'n Proto serialization (serde) and Cap'n Proto RPC, or alternatively, technologies such as RSocket.
Benchmarking results will be updated accordingly to reflect these improvements in due course.
The architecture resembles that of Lmax Architecture, but in a simplified form.
This is achieved by journaling command logs into Kafka and by omitting the use of the replicator processor.
-
cluster-app:leadernode: handles all incoming commands, queries.followernode: handles all incoming queries, replays command-logs published byleader.learnernode: replays command-logs published byleader, takes snapshot of state-machine.
-
client-appinteracts withcluster-appviagrpcprotocol, provides Restful Api. Including modules:adminuser
- All commands requested from client-apps are published into an inbound ring-buffer (command-buffer).
- The commands are then grouped into chunks and then streamed into disk (kafka - one partition) when disruptor's
EventHandlerreachesendOfBatch. - The business-logic consumer then processes all incoming commands in order to build
state-machine. - Finally, the results are published into an out-bound ring-buffer (reply-buffer) in order to reply back to
client-apps.
Before we dive in, let's go over a few preliminary notes:
- Only the
leadernode is responsible for writing thecommand-loginto the intocommand-log-storage. - Meanwhile, the
learnernode is exclusively tasked with snapshotting the state machine.
- Assume that the latest offset in kafka is
xand thelearnerreplayscommand-logup to m'th offset. - The
learnersnapshots thestate-machineinterval or everycommand-size. - Assume that the
learnersnapshots up to n'th offset.- For
optimization, thelearnersnapshots only thestatesthat have changed from the last-snapshot-offset to n'th offset.
- For
- When the
cluster(leader,followerorlearner) restart, it first loads thesnapshot, then replays thecommand-logfromn + 1'th offset to rebuild state-machine.- If there is no
snapshotstored in thedatabase, then theclusterwill replay allcommand-logfrom the beginning.
- If there is no
The snapshot trigger and logic can be found in LearnerBootstrap -> startReplayMessage() and ReplayBufferHandlerByLearner.
cluster-core: domain logic.cluster-app: framework & transport layer, implementscluster-core's interface ports.
Note: All producers(or dispatcher) and consumers(or processor) interacting with the same ring-buffer are managed as children of a buffer-channel.
- Journaling command logs.
- Replaying command logs.
- Managing state machine.
- Replicating state machine.
- Snapshotting state machine.
- Processing domain logic.
- Create balance.
- Deposit money.
- Withdraw money.
- Transfer money.
- Get balance by id.
- List all balances.
-
Admin:- Create balance.
- Deposit money.
- Withdraw money.
- Transfer money.
- List all balances.
- Get balance by id.
-
User:- Get current balance.
- Deposit money.
- Withdraw money.
- Transfer money.
cluster-core: Domain logic.cluster-app: Implementscluster-coreand provides transport-layer (ex: grpc), framework-layer.client-core: Provideslibsto interacts withcluster, providesrequest-replychannel for incoming requests.client: Interacts withcluster-app, providersapi-resource.
make help- Setup dev environment
make setup-dev- Start
leadernode - processing read and write requests
make run-leader- Start
followernode - processing read requests
make run-follower- Start
learnernode - snapshotting state machine
make run-learner- Start
adminapp - CRUD app
make run-admin- Start
userapp (Not available yet)
make run-userIn order to test grpc server, you can use portman to send message like this
We use ghz(link) as a benchmarking and load testing tool.
ghz --insecure --proto ./bank-libs/bank-cluster-proto/src/main/proto/balance.proto \
--call gc.garcol.bank.proto.BalanceQueryService/sendQuery \
-d '{"singleBalanceQuery": {"id": 1,"correlationId": "random-uuid"}}' \
-c 200 -n 100000 \
127.0.0.1:9500We use autocannon (it can produce more load than wrk and wrk2).
See BENCHMARK for more details.