This project provides a comparative implementation of stream processing using Apache Flink and Kafka Streams. It demonstrates how to perform similar data processing tasks using both frameworks, allowing developers to understand the differences, strengths, and trade-offs between them.
The project implements a stream processing pipeline that:
-
📥 Consumes click events from a Kafka topic
-
🔗 Joins these events with category data
-
⏱️ Performs windowed aggregation to count unique users per category
-
📤 Outputs the results to another Kafka topic
This project uses the following technologies:
-
☕ Java 17 - Programming language
-
🧩 Kotlin - Used for build configuration
-
🐘 Gradle - Build tool with Kotlin DSL
-
📊 Apache Kafka - Distributed streaming platform
-
🌊 Apache Flink - Stream processing framework
-
📋 Apache Avro - Data serialization system
-
🗄️ Confluent Schema Registry - Schema management service
-
🧪 JUnit 5 - Testing framework
-
🐳 TestContainers - Integration testing with containerized dependencies
src/
├── main/
│ ├── avro/ # Avro schema definitions
│ ├── java/ # Main source code
│ └── resources/ # Configuration files
├── test/
│ └── java/ # Unit tests
└── integrationTest/
└── java/ # Integration testsThe project contains three main implementations:
-
📊 KafkaStreamsProcessor - Implementation using Kafka Streams API
-
🌊 FlinkDataStreamProcessor - Implementation using Flink DataStream API
-
📋 FlinkTableProcessor - Implementation using Flink Table API
Each implementation provides the same functionality but uses different APIs and approaches.