Skip to content
/ Rafka Public
forked from Mahir101/Rafka

Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.

License

Notifications You must be signed in to change notification settings

Nirzak/Rafka

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Rafka

A High-Performance Distributed Message Broker Built in Rust

Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.

πŸš€ Key Features

  • High-Performance Async Architecture: Built on Tokio for maximum concurrency and throughput
  • gRPC Communication: Modern protocol buffers for efficient inter-service communication
  • Partitioned Message Processing: Hash-based partitioning for horizontal scalability
  • In-Memory Storage Engine: Custom-built storage with retention policies and metrics
  • Offset Tracking: Consumer offset management for reliable message delivery
  • Retention Policies: Configurable message retention based on age and size
  • Real-time Metrics: Built-in monitoring and performance metrics
  • Modular Design: Clean separation of concerns across multiple crates

πŸ—οΈ Architecture Overview

System Architecture Diagram

graph TB
    subgraph "Client Layer"
        P[Producer]
        C[Consumer]
    end
    
    subgraph "Broker Cluster"
        B1[Broker 1<br/>Partition 0]
        B2[Broker 2<br/>Partition 1]
        B3[Broker 3<br/>Partition 2]
    end
    
    subgraph "Storage Layer"
        S1[In-Memory DB<br/>Partition 0]
        S2[In-Memory DB<br/>Partition 1]
        S3[In-Memory DB<br/>Partition 2]
    end
    
    P -->|gRPC Publish| B1
    P -->|gRPC Publish| B2
    P -->|gRPC Publish| B3
    
    B1 -->|Store Messages| S1
    B2 -->|Store Messages| S2
    B3 -->|Store Messages| S3
    
    C -->|gRPC Consume| B1
    C -->|gRPC Consume| B2
    C -->|gRPC Consume| B3
    
    B1 -->|Broadcast Stream| C
    B2 -->|Broadcast Stream| C
    B3 -->|Broadcast Stream| C
Loading

Message Flow Sequence Diagram

sequenceDiagram
    participant P as Producer
    participant B as Broker
    participant S as Storage
    participant C as Consumer
    
    P->>B: PublishRequest(topic, key, payload)
    B->>B: Hash key for partition
    B->>B: Check partition ownership
    B->>S: Store message with offset
    S-->>B: Return offset
    B->>B: Broadcast to subscribers
    B-->>P: PublishResponse(message_id, offset)
    
    C->>B: ConsumeRequest(topic)
    B->>B: Create broadcast stream
    B-->>C: ConsumeResponse stream
    
    loop Message Processing
        B->>C: ConsumeResponse(message)
        C->>B: AcknowledgeRequest(message_id)
        C->>B: UpdateOffsetRequest(offset)
    end
Loading

πŸ“ Project Structure

rafka/
β”œβ”€β”€ Cargo.toml                 # Workspace manifest
β”œβ”€β”€ config/
β”‚   └── config.yml            # Configuration file
β”œβ”€β”€ scripts/                  # Demo and utility scripts
β”‚   β”œβ”€β”€ helloworld.sh         # Basic producer-consumer demo
β”‚   β”œβ”€β”€ partitioned_demo.sh   # Multi-broker partitioning demo
β”‚   β”œβ”€β”€ retention_demo.sh     # Message retention demo
β”‚   β”œβ”€β”€ offset_tracking_demo.sh # Consumer offset tracking demo
β”‚   └── kill.sh               # Process cleanup script
β”œβ”€β”€ src/
β”‚   └── bin/                  # Executable binaries
β”‚       β”œβ”€β”€ start_broker.rs   # Broker server
β”‚       β”œβ”€β”€ start_producer.rs # Producer client
β”‚       β”œβ”€β”€ start_consumer.rs # Consumer client
β”‚       └── check_metrics.rs  # Metrics monitoring
β”œβ”€β”€ crates/                   # Core library crates
β”‚   β”œβ”€β”€ core/                 # Core types and gRPC definitions
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ lib.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ message.rs    # Message structures
β”‚   β”‚   β”‚   └── proto/
β”‚   β”‚   β”‚       └── rafka.proto # gRPC service definitions
β”‚   β”‚   └── build.rs          # Protocol buffer compilation
β”‚   β”œβ”€β”€ broker/               # Broker implementation
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ lib.rs
β”‚   β”‚       └── broker.rs     # Core broker logic
β”‚   β”œβ”€β”€ producer/             # Producer implementation
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ lib.rs
β”‚   β”‚       └── producer.rs   # Producer client
β”‚   β”œβ”€β”€ consumer/             # Consumer implementation
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ lib.rs
β”‚   β”‚       └── consumer.rs   # Consumer client
β”‚   └── storage/              # Storage engine
β”‚       └── src/
β”‚           β”œβ”€β”€ lib.rs
β”‚           └── db.rs         # In-memory database
β”œβ”€β”€ docs/
β”‚   └── getting_started.md    # Getting started guide
β”œβ”€β”€ tasks/
β”‚   └── Roadmap.md           # Development roadmap
β”œβ”€β”€ Dockerfile               # Container configuration
└── LICENSE                  # MIT License

πŸš€ Quick Start

Prerequisites

  • Rust: Latest stable version (1.70+)
  • Cargo: Comes with Rust installation
  • Protocol Buffers: For gRPC compilation

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/rafka.git
cd rafka
  1. Build the project:
cargo build --release
  1. Run the basic demo:
./scripts/helloworld.sh

Manual Setup

  1. Start a broker:
cargo run --bin start_broker -- --port 50051 --partition 0 --total-partitions 3
  1. Start a consumer:
cargo run --bin start_consumer -- --port 50051
  1. Send messages:
cargo run --bin start_producer -- --message "Hello, Rafka!" --key "test-key"

πŸ”§ Configuration

Broker Configuration

The broker can be configured via command-line arguments:

cargo run --bin start_broker -- \
  --port 50051 \
  --partition 0 \
  --total-partitions 3 \
  --retention-seconds 604800

Available Options:

  • --port: Broker listening port (default: 50051)
  • --partition: Partition ID for this broker (default: 0)
  • --total-partitions: Total number of partitions (default: 1)
  • --retention-seconds: Message retention time in seconds (default: 7 days)

Configuration File

Edit config/config.yml for persistent settings:

server:
  host: "127.0.0.1"
  port: 9092

log:
  level: "info"  # debug, info, warn, error

broker:
  replication_factor: 3
  default_topic_partitions: 1

storage:
  type: "in_memory"

πŸ›οΈ Core Components

1. Core (rafka-core)

Purpose: Defines fundamental types and gRPC service contracts.

Key Components:

  • Message Structures: Message, MessageAck, BenchmarkMetrics
  • gRPC Definitions: Protocol buffer definitions for all services
  • Serialization: Serde-based serialization for message handling

Key Files:

  • message.rs: Core message types and acknowledgment structures
  • proto/rafka.proto: gRPC service definitions

2. Broker (rafka-broker)

Purpose: Central message routing and coordination service.

Key Features:

  • Partition Management: Hash-based message partitioning
  • Topic Management: Dynamic topic creation and subscription
  • Broadcast Channels: Efficient message distribution to consumers
  • Offset Tracking: Consumer offset management
  • Retention Policies: Configurable message retention
  • Metrics Collection: Real-time performance metrics

Key Operations:

  • publish(): Accept messages from producers
  • consume(): Stream messages to consumers
  • subscribe(): Register consumer subscriptions
  • acknowledge(): Process message acknowledgments
  • update_offset(): Track consumer progress

3. Producer (rafka-producer)

Purpose: Client library for publishing messages to brokers.

Key Features:

  • Connection Management: Automatic broker connection handling
  • Message Publishing: Reliable message delivery with acknowledgments
  • Error Handling: Comprehensive error reporting
  • UUID Generation: Unique message identification

Usage Example:

let mut producer = Producer::new("127.0.0.1:50051").await?;
producer.publish("my-topic".to_string(), "Hello World".to_string(), "key-1".to_string()).await?;

4. Consumer (rafka-consumer)

Purpose: Client library for consuming messages from brokers.

Key Features:

  • Subscription Management: Topic subscription handling
  • Stream Processing: Asynchronous message streaming
  • Automatic Acknowledgment: Built-in message acknowledgment
  • Offset Tracking: Automatic offset updates
  • Channel-based API: Clean async/await interface

Usage Example:

let mut consumer = Consumer::new("127.0.0.1:50051").await?;
consumer.subscribe("my-topic".to_string()).await?;
let mut rx = consumer.consume("my-topic".to_string()).await?;
while let Some(message) = rx.recv().await {
    println!("Received: {}", message);
}

5. Storage (rafka-storage)

Purpose: High-performance in-memory storage engine.

Key Features:

  • Partition-based Storage: Separate queues per partition
  • Retention Policies: Age and size-based message retention
  • Offset Management: Efficient offset tracking and retrieval
  • Acknowledgment Tracking: Consumer acknowledgment management
  • Metrics Collection: Storage performance metrics
  • Memory Optimization: Efficient memory usage with cleanup

Storage Architecture:

graph LR
    subgraph "Storage Engine"
        T[Topic]
        P1[Partition 0]
        P2[Partition 1]
        P3[Partition 2]
        
        T --> P1
        T --> P2
        T --> P3
        
        P1 --> Q1[Message Queue]
        P2 --> Q2[Message Queue]
        P3 --> Q3[Message Queue]
    end
Loading

πŸ”„ Message Flow

Publishing Flow

  1. Producer sends PublishRequest to Broker
  2. Broker hashes the message key to determine partition
  3. Broker checks partition ownership
  4. Broker stores message in Storage with unique offset
  5. Broker broadcasts message to subscribed consumers
  6. Broker returns PublishResponse with message ID and offset

Consumption Flow

  1. Consumer sends ConsumeRequest to Broker
  2. Broker creates broadcast stream for the topic
  3. Broker streams messages via gRPC to Consumer
  4. Consumer processes message and sends acknowledgment
  5. Consumer updates offset to track progress
  6. Storage cleans up acknowledged messages based on retention policy

πŸ“Š Performance Features

Partitioning Strategy

Rafka uses hash-based partitioning for efficient message distribution:

fn hash_key(&self, key: &str) -> u32 {
    key.bytes().fold(0u32, |acc, b| acc.wrapping_add(b as u32))
}

fn owns_partition(&self, message_key: &str) -> bool {
    let hash = self.hash_key(message_key);
    hash % self.total_partitions == self.partition_id
}

Retention Policies

Configurable message retention based on:

  • Time-based: Maximum age (default: 7 days)
  • Size-based: Maximum storage size (default: 1GB)

Metrics Collection

Built-in metrics for monitoring:

  • Total messages stored
  • Total bytes consumed
  • Oldest message age
  • Consumer offset positions

πŸ§ͺ Demo Scripts

1. Hello World Demo

./scripts/helloworld.sh

Basic producer-consumer interaction demonstration.

2. Partitioned Demo

./scripts/partitioned_demo.sh

Multi-broker setup with hash-based partitioning.

3. Retention Demo

./scripts/retention_demo.sh

Demonstrates message retention policies.

4. Offset Tracking Demo

./scripts/offset_tracking_demo.sh

Shows consumer offset management and recovery.

πŸ› οΈ Development

Building from Source

# Clone repository
git clone https://github.com/yourusername/rafka.git
cd rafka

# Build all crates
cargo build

# Run tests
cargo test

# Build release version
cargo build --release

Running Tests

# Run all tests
cargo test

# Run specific crate tests
cargo test -p rafka-storage
cargo test -p rafka-broker

Code Structure

The project follows Rust best practices with:

  • Workspace Organization: Multiple crates in a single workspace
  • Separation of Concerns: Each component in its own crate
  • Async/Await: Modern async Rust with Tokio
  • Error Handling: Comprehensive error types and handling
  • Testing: Unit tests for all major components

🚧 Current Status

⚠️ Early Development - Not Production Ready

Rafka is currently in active development. The current implementation provides:

βœ… Completed Features:

  • Basic message publishing and consumption
  • Hash-based partitioning
  • In-memory storage with retention policies
  • Consumer offset tracking
  • gRPC-based communication
  • Metrics collection
  • Demo scripts and examples

πŸ”„ In Progress:

  • Peer-to-peer mesh networking
  • Distributed consensus algorithms
  • Kubernetes deployment configurations
  • Performance optimizations

πŸ“‹ Planned Features:

  • Replication across multiple brokers
  • Fault tolerance and recovery
  • Security and authentication
  • Client SDKs for multiple languages
  • Comprehensive monitoring and alerting

🀝 Contributing

We welcome contributions! Here are some areas where you can help:

High Priority

  • P2P Mesh Implementation: Distributed node discovery and communication
  • Consensus Algorithms: Leader election and cluster coordination
  • Replication: Cross-broker message replication
  • Fault Tolerance: Node failure detection and recovery

Medium Priority

  • Performance Optimization: Message batching and compression
  • Security: TLS encryption and authentication
  • Monitoring: Prometheus metrics and Grafana dashboards
  • Documentation: API documentation and tutorials

Getting Started

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Apache Kafka for inspiration on messaging systems
  • Tokio for the excellent async runtime
  • Tonic for gRPC implementation
  • @wyattgill9 for the initial proof of concept
  • The Rust community for their excellent libraries and support

Built with ❀️ in Rust

About

Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 86.8%
  • Shell 11.5%
  • Dockerfile 1.7%