hadoop

Star

Here are 547 public repositories matching this topic...

Raveesh1505 / BigData-Training

Star

Big data training material

big-data hadoop bigdata pig mapreduce pig-latin hadoop-mapreduce apache-pig mapreduce-java hadoop-hdfs mapreduce-python

Updated Jun 29, 2023
Python

jeshwanthreddy13 / Yet_Another_Hadoop

Star

A distributed file system program that works like Hadoop with minor changes. A completely working program that incorporates asynchronous distribution of files and map and reduce components. It has its own command line interfaces with all the required commands.

hadoop bigdata python3 asynchronous-programming mapreduce distributed-file-system

Updated Feb 9, 2023
Python

chandrasekhar-syamala / TwitterFeedAnalysis

Star

An academic project as a part of course, "Principles of Big Data Management", to develop a system to store, process, analyse, and visualize Twitter’s data using Apache Spark

flask machine-learning hadoop pyspark

Updated Jun 3, 2023
Python

furkancets / PrescreiberPipelineSpark

Star

Trying best case apache spark working environment for robust data pipelines

spark apache-spark hadoop pyspark

Updated Apr 1, 2023
Python

mariam222-cypro / Twitter_Data_Pipeline

Star

Real Time Streaming: Twitter Data Pipeline Using Big data Tools

big-data spark hive hadoop sentiment-analysis twitter-api pyspark spark-streaming hashtags hiveql

Updated May 8, 2023
Python

melvinjjoseph / Big-Data-UE21CS343AB2-Assignments

Star

Big Data Assignments

kafka spark hadoop hadoop-mapreduce

Updated Dec 6, 2023
Python

mmingalov / kc-hadoop

Star

spark hive hadoop hdfs

Updated Jul 20, 2022
Python

arashabe / ENEA-AnomalyDetectionPipeline

Star

A PySpark-based pipeline for detecting anomalies in energy consumption using unsupervised models (PCA, Isolation Forest, LOF). The system processes raw JSON data, aggregates monthly features, and identifies anomalous PODIDs using an ensemble approach, ready for production deployment.

machine-learning hadoop pipeline-framework pyspark energy-consumption anomaly-detection university-of-bergamo

Updated Jul 15, 2025
Python

nanfengpo / cf-recommendations

Star

Collaborative Filtering based recommendation map reduce job using Yelp's mrjob

hadoop mrjob

Updated Feb 18, 2016
Python

MyUdacityProjects / HadoopMapreduce

Star

hadoop hadoop-mapreduce

Updated Apr 15, 2017
Python

udayshankar1306 / spark_way

Star

Apache Spark - From installation to performing awesome operations in Apache Spark Stack

emr spark apache-spark hadoop spark-fundamentals

Updated May 8, 2017
Python

Yiyun-Liang / Forum-Posts-Analysis

Star

MapReduce scripts for forum data analysis.

python hadoop mapreduce

Updated Feb 3, 2017
Python

anuragkh / hadoop-ec2

Star

Hadoop EC2 scripts

hive hadoop ec2

Updated Jul 23, 2020
Python

mattavallone / Big-Data-Project

Star

Data cleaning and profiling of NYC Open Data

big-data hadoop pyspark nyc-opendata

Updated Mar 26, 2020
Python

itsSwapnil / Pyspark_data_pipeline_with_Airflow_orchastration

Star

This repository contains an Airflow DAG that orchestrates an incremental data pipeline using PySpark scripts. The pipeline automates daily processing data, syncs results to S3, performs housekeeping, and loops until a target date threshold is reached.

elasticsearch airflow spark hadoop etl pyspark data-engineer

Updated Aug 16, 2025
Python

Akashi23 / hadoop-problems

Star

Обучение hadoop для 2 курса Narxoz

hadoop hdfs

Updated Nov 17, 2021
Python

yuliya-akchurina / Big-Data-Programming

Star

Big Data Programming Projects

spark hadoop hadoop-mapreduce mapreduce-python

Updated Jun 22, 2022
Python

SagarFall2022 / CloudComputing

Star

Cloud Computing Projects

spark hadoop distributed-key-value-database linux-sort

Updated May 20, 2023
Python

itsSwapnil / Data-Interpolation-with-Radial-Basis-Function

Star

A PySpark-based solution for cleaning and interpolating battery sensor data using forward/backward fill and Radial Basis Function (RBF) spatial interpolation. Outputs a clean, fully interpolated dataset in CSV format for advanced analysis.

spark hadoop etl pyspark data-engineer data-interpolation

Updated Aug 16, 2025
Python

PriyankaKhivsara / ecommerce-multi-marketplace-analytics

Star

A big data analytics project that integrates sales data from Flipkart, Amazon, and Meesho into a unified pipeline. Data is processed with Apache Spark, stored in MySQL, and visualized using Power BI/Tableau to uncover trends, top-selling products, and customer purchase patterns. Designed to support data-driven decision-making in e-commerce.

mysql hadoop etl analytics bigdata pyspark tableau powerbi datapipeline dataengineering datavisualization apachespark ecommerceanalytics

Updated Aug 22, 2025
Python

Improve this page

Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop

Here are 547 public repositories matching this topic...

Raveesh1505 / BigData-Training

jeshwanthreddy13 / Yet_Another_Hadoop

chandrasekhar-syamala / TwitterFeedAnalysis

furkancets / PrescreiberPipelineSpark

mariam222-cypro / Twitter_Data_Pipeline

melvinjjoseph / Big-Data-UE21CS343AB2-Assignments

mmingalov / kc-hadoop

arashabe / ENEA-AnomalyDetectionPipeline

nanfengpo / cf-recommendations

MyUdacityProjects / HadoopMapreduce

udayshankar1306 / spark_way

Yiyun-Liang / Forum-Posts-Analysis

anuragkh / hadoop-ec2

mattavallone / Big-Data-Project

itsSwapnil / Pyspark_data_pipeline_with_Airflow_orchastration

Akashi23 / hadoop-problems

yuliya-akchurina / Big-Data-Programming

SagarFall2022 / CloudComputing

itsSwapnil / Data-Interpolation-with-Radial-Basis-Function

PriyankaKhivsara / ecommerce-multi-marketplace-analytics

Improve this page

Add this topic to your repo