hadoop

Star

Here are 140 public repositories matching this topic...

apache / kyuubi

Star

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

kubernetes sql spark hive hadoop jdbc thrift data-lake hacktoberfest spark-sql

Updated Dec 11, 2025
Scala

AbsaOSS / spline

Star

Data Lineage Tracking And Visualization Solution

visualization tracking scala spark hadoop bigdata lineage

Updated Dec 12, 2025
Scala

smart-data-lake / smart-data-lake

Star

Smart Automation Tool for building modern Data Lakes and Data Pipelines

scala spark hive hadoop transform-data data-lake data-pipelines deltalake smart-data-lake

Updated Dec 5, 2025
Scala

archivesunleashed / aut

Star

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

scala big-data spark apache-spark hadoop analysis python3 text-extraction pyspark digital-humanities dataframe big-data-analytics webarchives network-graphing

Updated Dec 5, 2025
Scala

dimajix / flowman

Star

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

scala sql big-data spark apache-spark hadoop etl bigdata data-engineering flowman

Updated Dec 11, 2025
Scala

apache / carbondata

Star

High performance data store solution

java scala big-data spark hadoop apache data-format carbondata

Updated Nov 10, 2025
Scala

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

elasticsearch scala kafka akka spark yarn hadoop solr jdbc hbase spark-streaming hdfs parquet

Updated Oct 28, 2025
Scala

brankowss / scala-spark-ecommerce-pipeline

Star

An end-to-end batch data pipeline for e-commerce analytics, built with Scala, Spark, HDFS, Hive, Postgres, and Jenkins. The system is enhanced with a parallel streaming pipeline using Kafka for real-time Twitter trend analysis.

jenkins scala apache-spark hadoop stream-processing data-engineering apache-kafka batch-processing

Updated Oct 8, 2025
Scala

AbsaOSS / enceladus

Star

Dynamic Conformance Engine

scala spark spring mongodb hadoop bigdata datalake

Updated Oct 17, 2025
Scala

hexnn / Stark

Star

基于Spark+SparkMLlib+Debezium+Deequ打造的简单易用、超高性能大数据治理引擎。适用于批流一体的数据集成和数据分析，支持CDC实时数据采集、机器学习算法模型、数据质量校验、数据标注、敏感数据识别、数据建模、算法建模和OLAP数据分析

spark hadoop etl sparkml mllib kettle flink cdc debezium datax sparkmllib dataworks deequ seatunnel

Updated Aug 20, 2025
Scala

mjakubowski84 / parquet4s

Sponsor

Star

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

aws scala akka hadoop bigdata google-storage fs2 writer streams reader parquet akka-streams parquet-files

Updated Jul 13, 2025
Scala

velpurinagasai99 / Spark

Star

Projects from TrendyTech

scala hive hadoop pyspark

Updated Jul 9, 2025
Scala

pavithra19 / apache_spark_people_data_processor

Star

This project is a data processing application built with Apache Spark and Scala. This is designed to efficiently process, analyze and transform large datasets related to people data. It leverages Spark’s distributed computing capabilities to handle scalable data ingestion, cleaning and reporting. Shell scripts are included for hadoop deployment.

scala hadoop hdfs dataengineering apachespark

Updated Jun 11, 2025
Scala

bigchange / AI

Star

For artificial intelligence with sbt to build - maven build pls step forward to https://github.com/bgfurfeature/AI

scala ai spark hadoop

Updated Jan 3, 2025
Scala

Adelin-Info / TP_DATACLOUD

Star

Architecture et développement des systèmes distribuées à large echelle

java scala spark yarn hadoop zookeeper map-reduce

Updated Dec 6, 2024
Scala

ZongXR / BigData-Competition

Star

全国大数据竞赛三等奖解决方案，省赛二等奖解决方案。一键安装大数据环境脚本，自动部署集群环境，包括zookeeper、hadoop、mysql、hive、spark以及一些基础环境。已通过实际服务器测试，效果极佳，仅需要输入密码等少量人为干预。解放安装部署配置所需人力。并添加若干scala案例，结合spark用以进行数据准备。

mysql shell scala spark hive hadoop bigdata zookeeper hdfs wordcount

Updated Sep 26, 2024
Scala

Inmapg / data-lake-compaction

Star

Batch process that compacts different parquet files stored at Azure Data Lake Storage following the requirements specified at README.

data-science scala spark hadoop azure data-lake azure-storage azure-data-lake data-lake-store

Updated Sep 25, 2024
Scala

chophannnn / etl

Star

docker scala spark hive hadoop sbt docker-compose makefile

Updated Aug 16, 2024
Scala

k-zyra / bio-app

Star

An application for DNA sequence analysis written in Scala and deployed using Apache Spark&Hadoop.

java bioinformatics scala spark apache-spark hadoop sbt sbt-assembly bioinformatics-analysis dna-sequence-analysis bioinformatics-tool

Updated May 26, 2024
Scala

zhaocc1106 / big-data

Star

hadoop, spark...

big-data spark hadoop

Updated May 22, 2024
Scala

Improve this page

Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop

Here are 140 public repositories matching this topic...

apache / kyuubi

AbsaOSS / spline

smart-data-lake / smart-data-lake

archivesunleashed / aut

dimajix / flowman

apache / carbondata

agile-lab-dev / wasp

brankowss / scala-spark-ecommerce-pipeline

AbsaOSS / enceladus

hexnn / Stark

mjakubowski84 / parquet4s

velpurinagasai99 / Spark

pavithra19 / apache_spark_people_data_processor

bigchange / AI

Adelin-Info / TP_DATACLOUD

ZongXR / BigData-Competition

Inmapg / data-lake-compaction

chophannnn / etl

k-zyra / bio-app

zhaocc1106 / big-data

Improve this page

Add this topic to your repo