A High-Performance Distributed Message Broker Built in Rust
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.
- High-Performance Async Architecture: Built on Tokio for maximum concurrency and throughput
- gRPC Communication: Modern protocol buffers for efficient inter-service communication
- Partitioned Message Processing: Hash-based partitioning for horizontal scalability
- In-Memory Storage Engine: Custom-built storage with retention policies and metrics
- Offset Tracking: Consumer offset management for reliable message delivery
- Retention Policies: Configurable message retention based on age and size
- Real-time Metrics: Built-in monitoring and performance metrics
- Modular Design: Clean separation of concerns across multiple crates
graph TB
subgraph "Client Layer"
P[Producer]
C[Consumer]
end
subgraph "Broker Cluster"
B1[Broker 1<br/>Partition 0]
B2[Broker 2<br/>Partition 1]
B3[Broker 3<br/>Partition 2]
end
subgraph "Storage Layer"
S1[In-Memory DB<br/>Partition 0]
S2[In-Memory DB<br/>Partition 1]
S3[In-Memory DB<br/>Partition 2]
end
P -->|gRPC Publish| B1
P -->|gRPC Publish| B2
P -->|gRPC Publish| B3
B1 -->|Store Messages| S1
B2 -->|Store Messages| S2
B3 -->|Store Messages| S3
C -->|gRPC Consume| B1
C -->|gRPC Consume| B2
C -->|gRPC Consume| B3
B1 -->|Broadcast Stream| C
B2 -->|Broadcast Stream| C
B3 -->|Broadcast Stream| C
sequenceDiagram
participant P as Producer
participant B as Broker
participant S as Storage
participant C as Consumer
P->>B: PublishRequest(topic, key, payload)
B->>B: Hash key for partition
B->>B: Check partition ownership
B->>S: Store message with offset
S-->>B: Return offset
B->>B: Broadcast to subscribers
B-->>P: PublishResponse(message_id, offset)
C->>B: ConsumeRequest(topic)
B->>B: Create broadcast stream
B-->>C: ConsumeResponse stream
loop Message Processing
B->>C: ConsumeResponse(message)
C->>B: AcknowledgeRequest(message_id)
C->>B: UpdateOffsetRequest(offset)
end
rafka/
βββ Cargo.toml # Workspace manifest
βββ config/
β βββ config.yml # Configuration file
βββ scripts/ # Demo and utility scripts
β βββ helloworld.sh # Basic producer-consumer demo
β βββ partitioned_demo.sh # Multi-broker partitioning demo
β βββ retention_demo.sh # Message retention demo
β βββ offset_tracking_demo.sh # Consumer offset tracking demo
β βββ kill.sh # Process cleanup script
βββ src/
β βββ bin/ # Executable binaries
β βββ start_broker.rs # Broker server
β βββ start_producer.rs # Producer client
β βββ start_consumer.rs # Consumer client
β βββ check_metrics.rs # Metrics monitoring
βββ crates/ # Core library crates
β βββ core/ # Core types and gRPC definitions
β β βββ src/
β β β βββ lib.rs
β β β βββ message.rs # Message structures
β β β βββ proto/
β β β βββ rafka.proto # gRPC service definitions
β β βββ build.rs # Protocol buffer compilation
β βββ broker/ # Broker implementation
β β βββ src/
β β βββ lib.rs
β β βββ broker.rs # Core broker logic
β βββ producer/ # Producer implementation
β β βββ src/
β β βββ lib.rs
β β βββ producer.rs # Producer client
β βββ consumer/ # Consumer implementation
β β βββ src/
β β βββ lib.rs
β β βββ consumer.rs # Consumer client
β βββ storage/ # Storage engine
β βββ src/
β βββ lib.rs
β βββ db.rs # In-memory database
βββ docs/
β βββ getting_started.md # Getting started guide
βββ tasks/
β βββ Roadmap.md # Development roadmap
βββ Dockerfile # Container configuration
βββ LICENSE # MIT License
- Rust: Latest stable version (1.70+)
- Cargo: Comes with Rust installation
- Protocol Buffers: For gRPC compilation
- Clone the repository:
git clone https://github.com/yourusername/rafka.git
cd rafka- Build the project:
cargo build --release- Run the basic demo:
./scripts/helloworld.sh- Start a broker:
cargo run --bin start_broker -- --port 50051 --partition 0 --total-partitions 3- Start a consumer:
cargo run --bin start_consumer -- --port 50051- Send messages:
cargo run --bin start_producer -- --message "Hello, Rafka!" --key "test-key"The broker can be configured via command-line arguments:
cargo run --bin start_broker -- \
--port 50051 \
--partition 0 \
--total-partitions 3 \
--retention-seconds 604800Available Options:
--port: Broker listening port (default: 50051)--partition: Partition ID for this broker (default: 0)--total-partitions: Total number of partitions (default: 1)--retention-seconds: Message retention time in seconds (default: 7 days)
Edit config/config.yml for persistent settings:
server:
host: "127.0.0.1"
port: 9092
log:
level: "info" # debug, info, warn, error
broker:
replication_factor: 3
default_topic_partitions: 1
storage:
type: "in_memory"Purpose: Defines fundamental types and gRPC service contracts.
Key Components:
- Message Structures:
Message,MessageAck,BenchmarkMetrics - gRPC Definitions: Protocol buffer definitions for all services
- Serialization: Serde-based serialization for message handling
Key Files:
message.rs: Core message types and acknowledgment structuresproto/rafka.proto: gRPC service definitions
Purpose: Central message routing and coordination service.
Key Features:
- Partition Management: Hash-based message partitioning
- Topic Management: Dynamic topic creation and subscription
- Broadcast Channels: Efficient message distribution to consumers
- Offset Tracking: Consumer offset management
- Retention Policies: Configurable message retention
- Metrics Collection: Real-time performance metrics
Key Operations:
publish(): Accept messages from producersconsume(): Stream messages to consumerssubscribe(): Register consumer subscriptionsacknowledge(): Process message acknowledgmentsupdate_offset(): Track consumer progress
Purpose: Client library for publishing messages to brokers.
Key Features:
- Connection Management: Automatic broker connection handling
- Message Publishing: Reliable message delivery with acknowledgments
- Error Handling: Comprehensive error reporting
- UUID Generation: Unique message identification
Usage Example:
let mut producer = Producer::new("127.0.0.1:50051").await?;
producer.publish("my-topic".to_string(), "Hello World".to_string(), "key-1".to_string()).await?;Purpose: Client library for consuming messages from brokers.
Key Features:
- Subscription Management: Topic subscription handling
- Stream Processing: Asynchronous message streaming
- Automatic Acknowledgment: Built-in message acknowledgment
- Offset Tracking: Automatic offset updates
- Channel-based API: Clean async/await interface
Usage Example:
let mut consumer = Consumer::new("127.0.0.1:50051").await?;
consumer.subscribe("my-topic".to_string()).await?;
let mut rx = consumer.consume("my-topic".to_string()).await?;
while let Some(message) = rx.recv().await {
println!("Received: {}", message);
}Purpose: High-performance in-memory storage engine.
Key Features:
- Partition-based Storage: Separate queues per partition
- Retention Policies: Age and size-based message retention
- Offset Management: Efficient offset tracking and retrieval
- Acknowledgment Tracking: Consumer acknowledgment management
- Metrics Collection: Storage performance metrics
- Memory Optimization: Efficient memory usage with cleanup
Storage Architecture:
graph LR
subgraph "Storage Engine"
T[Topic]
P1[Partition 0]
P2[Partition 1]
P3[Partition 2]
T --> P1
T --> P2
T --> P3
P1 --> Q1[Message Queue]
P2 --> Q2[Message Queue]
P3 --> Q3[Message Queue]
end
- Producer sends
PublishRequestto Broker - Broker hashes the message key to determine partition
- Broker checks partition ownership
- Broker stores message in Storage with unique offset
- Broker broadcasts message to subscribed consumers
- Broker returns
PublishResponsewith message ID and offset
- Consumer sends
ConsumeRequestto Broker - Broker creates broadcast stream for the topic
- Broker streams messages via gRPC to Consumer
- Consumer processes message and sends acknowledgment
- Consumer updates offset to track progress
- Storage cleans up acknowledged messages based on retention policy
Rafka uses hash-based partitioning for efficient message distribution:
fn hash_key(&self, key: &str) -> u32 {
key.bytes().fold(0u32, |acc, b| acc.wrapping_add(b as u32))
}
fn owns_partition(&self, message_key: &str) -> bool {
let hash = self.hash_key(message_key);
hash % self.total_partitions == self.partition_id
}Configurable message retention based on:
- Time-based: Maximum age (default: 7 days)
- Size-based: Maximum storage size (default: 1GB)
Built-in metrics for monitoring:
- Total messages stored
- Total bytes consumed
- Oldest message age
- Consumer offset positions
./scripts/helloworld.shBasic producer-consumer interaction demonstration.
./scripts/partitioned_demo.shMulti-broker setup with hash-based partitioning.
./scripts/retention_demo.shDemonstrates message retention policies.
./scripts/offset_tracking_demo.shShows consumer offset management and recovery.
# Clone repository
git clone https://github.com/yourusername/rafka.git
cd rafka
# Build all crates
cargo build
# Run tests
cargo test
# Build release version
cargo build --release# Run all tests
cargo test
# Run specific crate tests
cargo test -p rafka-storage
cargo test -p rafka-brokerThe project follows Rust best practices with:
- Workspace Organization: Multiple crates in a single workspace
- Separation of Concerns: Each component in its own crate
- Async/Await: Modern async Rust with Tokio
- Error Handling: Comprehensive error types and handling
- Testing: Unit tests for all major components
Rafka is currently in active development. The current implementation provides:
β Completed Features:
- Basic message publishing and consumption
- Hash-based partitioning
- In-memory storage with retention policies
- Consumer offset tracking
- gRPC-based communication
- Metrics collection
- Demo scripts and examples
π In Progress:
- Peer-to-peer mesh networking
- Distributed consensus algorithms
- Kubernetes deployment configurations
- Performance optimizations
π Planned Features:
- Replication across multiple brokers
- Fault tolerance and recovery
- Security and authentication
- Client SDKs for multiple languages
- Comprehensive monitoring and alerting
We welcome contributions! Here are some areas where you can help:
- P2P Mesh Implementation: Distributed node discovery and communication
- Consensus Algorithms: Leader election and cluster coordination
- Replication: Cross-broker message replication
- Fault Tolerance: Node failure detection and recovery
- Performance Optimization: Message batching and compression
- Security: TLS encryption and authentication
- Monitoring: Prometheus metrics and Grafana dashboards
- Documentation: API documentation and tutorials
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Apache Kafka for inspiration on messaging systems
- Tokio for the excellent async runtime
- Tonic for gRPC implementation
- @wyattgill9 for the initial proof of concept
- The Rust community for their excellent libraries and support
Built with β€οΈ in Rust