Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
Updated
Sep 11, 2025 - Scala
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
High performance data store solution
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
This project is a data processing application built with Apache Spark and Scala. This is designed to efficiently process, analyze and transform large datasets related to people data. It leverages Spark’s distributed computing capabilities to handle scalable data ingestion, cleaning and reporting. Shell scripts are included for hadoop deployment.
For artificial intelligence with sbt to build - maven build pls step forward to https://github.com/bgfurfeature/AI
An application for DNA sequence analysis written in Scala and deployed using Apache Spark&Hadoop.
深圳地铁大数据客流分析系统🚇🚄🌟
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Average Temperature - Hadoop - Mapper - Reducer
Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.
To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."