hadoop

Here are 566 public repositories matching this topic...

hghghgh12 / Large-Scale-Data-Pipeline-Migration

🚀 Migrate legacy mainframe data to a modern Hadoop ecosystem, automating ingestion, transformation, and validation for scalable storage and analytics.

mysql big-data spark hive hadoop oozie sqoop dataengineering etl-pipeline

Updated Dec 13, 2025
Python

treeverse / dvc-hdfs

Star

HDFS/WebHDFS plugin for dvc

plugin hadoop hdfs dvc dvc-plugin

Updated Dec 11, 2025
Python

hoangsonww / End-to-End-Data-Pipeline

Star

📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.

Updated Dec 10, 2025
Python

daviddwlee84 / RaspPi-Cluster

Star

An efficient quick-start tool to build a Raspberry Pi (or Debian-based) Cluster with popular ecosystem like Hadoop, Spark

spark hadoop fabric raspberrypi cluster

Updated Dec 10, 2025
Python

ab2dridi / webhdfsmagic

Star

webhdfsmagic is a Python package that provides IPython magic commands to interact with HDFS via WebHDFS/Knox.

python data-science big-data spark hadoop jupyter pypi cloudera python3 data-engineering hdfs databricks webhdfs knox magic-commands

Updated Dec 8, 2025
Python

konuralpb / movielens-hadoop-analyzer

Star

A distributed Big Data analytics system using Hadoop MapReduce (Python) and a custom Tkinter GUI to process the MovieLens dataset (Z-Score, IQR, Skewness).

python ssh big-data hadoop data-analysis mapreduce paramiko hadoop-streaming tkinter-gui

Updated Dec 5, 2025
Python

Breaka84 / Spooq

Star

big-data spark hadoop etl extract data-engineering load transform etl-pipeline

Updated Dec 3, 2025
Python

splitlog / splitlog

Star

Utility to split aggregated logs from Apache Hadoop Yarn applications into a folder hierarchy

python hadoop

Updated Nov 29, 2025
Python

DrFarouk / big-data-analytics

Star

A practical coursework-style project from my Master's studies in Big Data Analytics (at University of East London), showcasing hands-on use of big data tools and techniques on a real-world cyber-security dataset.

Updated Nov 18, 2025
Python

shrishtibhasin / commerceflow-data-pipeline

Star

The project aims to design and implement an advanced data pipeline for an e-commerce platform, addressing complexities such as schema evolution, incremental updates, and data transformation to support analytics and scalability requirements.

python aws sql kafka spark hadoop etl data-engineering data-pipeline

Updated Nov 14, 2025
Python

josemarialuna / ing-datos-big-data-US

Star

Entorno de Big Data basado en Docker con Hadoop, Hive, Trino, Kafka y Airflow. Incluye configuraciones, scripts y ejemplos de MapReduce para análisis de datos distribuidos.

python docker hadoop bigdata

Updated Nov 14, 2025
Python

RAG-io / smart-city-analytics

Star

A cloud-ready smart city analytics dashboard built with Flask, Plotly, and Python. Features air quality, crime, and ICT data visualizations with a modular architecture for Big Data and cloud integration.

python flask cloud big-data hadoop data-visualization pyspark cloudcomputing smart-city

Updated Nov 13, 2025
Python

divithraju / Large-Scale-Data-Pipeline-Migration

Star

mysql big-data spark hive hadoop oozie sqoop dataengineering etl-pipeline data-migrat

Updated Nov 12, 2025
Python

HariSekhon / Nagios-Plugins

Star

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Updated Nov 6, 2025
Python

HariSekhon / DevOps-Python-tools

Star

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Updated Nov 6, 2025
Python

eecs485staff / madoop

Star

A light weight MapReduce framework for education

hadoop mapreduce

Updated Nov 3, 2025
Python

AthinaKyriakou / mrbox

Star

An open source experimental application aiming to simplify working with remote heterogeneous analytics and storage services via the file system of the Linux operating system. Published at EDBT/ICDT 2021.

hadoop b2drop big-data-analytics

Updated Nov 3, 2025
Python

call518 / MCP-Ambari-API

Star

🔍Model Context Protocol (MCP) server for Apache Ambari API integration. This project provides tools for managing Hadoop clusters, including service operations, configuration management, status monitoring, and request tracking.

Updated Nov 24, 2025
Python

BHUVANESH-SSN / big-data-project

Star

This project implements a real-time credit card fraud detection system using big data technologies. It simulates a production-grade fraud detection pipeline where credit card transactions are streamed through Apache Kafka, classified in real-time using a trained Mahout Random Forest model, and stored in separate databases based on fraud predictions

kafka big-data spark hadoop mahout

Updated Oct 25, 2025
Python

shivam1423 / Real-Time-Grid-Monitoring-System

Star

Real-Time Grid Monitoring System is an end-to-end data pipeline and analytics platform that enables live monitoring, analysis, and visualization of electrical grid performance.

elasticsearch airflow kafka spark dashboard hadoop kibana-dashboard

Updated Oct 25, 2025
Python

Improve this page

Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop

Here are 566 public repositories matching this topic...

hghghgh12 / Large-Scale-Data-Pipeline-Migration

treeverse / dvc-hdfs

hoangsonww / End-to-End-Data-Pipeline

daviddwlee84 / RaspPi-Cluster

ab2dridi / webhdfsmagic

konuralpb / movielens-hadoop-analyzer

Breaka84 / Spooq

splitlog / splitlog

DrFarouk / big-data-analytics

shrishtibhasin / commerceflow-data-pipeline

josemarialuna / ing-datos-big-data-US

RAG-io / smart-city-analytics

divithraju / Large-Scale-Data-Pipeline-Migration

HariSekhon / Nagios-Plugins

HariSekhon / DevOps-Python-tools

eecs485staff / madoop

AthinaKyriakou / mrbox

call518 / MCP-Ambari-API

BHUVANESH-SSN / big-data-project

shivam1423 / Real-Time-Grid-Monitoring-System

Improve this page

Add this topic to your repo