Stars
ClickHouse® is a real-time analytics database management system
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.
Upserts, Deletes And Incremental Processing on Big Data.
Spark code to analyze HBase Snapshots
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Remote Shuffle Service for Flink
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
A RPC framework leveraging Spark RPC module
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Netty project - an event-driven asynchronous network application framework
This is a library for SQL optimizing/rewriting including Materialized View rewrite
TPC-DS benchmark kit with some modifications/fixes
This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Apache Spark - A unified analytics engine for large-scale data processing