hadoop

Star

Here are 549 public repositories matching this topic...

Raveesh1505 / BigData-Training

Star

Big data training material

big-data hadoop bigdata pig mapreduce pig-latin hadoop-mapreduce apache-pig mapreduce-java hadoop-hdfs mapreduce-python

Updated Jun 29, 2023
Python

jeshwanthreddy13 / Yet_Another_Hadoop

Star

A distributed file system program that works like Hadoop with minor changes. A completely working program that incorporates asynchronous distribution of files and map and reduce components. It has its own command line interfaces with all the required commands.

hadoop bigdata python3 asynchronous-programming mapreduce distributed-file-system

Updated Feb 9, 2023
Python

furkancets / PrescreiberPipelineSpark

Star

Trying best case apache spark working environment for robust data pipelines

spark apache-spark hadoop pyspark

Updated Apr 1, 2023
Python

glinmac / hdp-tools

Star

HDP related tools

hadoop ambari hortonworks hdp

Updated Nov 3, 2016
Python

yashichawla / YAH

Star

A mini-Hadoop clone capable of performing all DFS functionalities through a CLI

hadoop distributed-file-system hadoop-clone

Updated Dec 6, 2021
Python

mmingalov / kc-hadoop

Star

spark hive hadoop hdfs

Updated Jul 20, 2022
Python

MartinCastroAlvarez / hadoop-hdfs-map-reduce-docker

Star

Running Map Reduce in Hadoop using Docker

big-data hadoop bigdata map-reduce hdfs

Updated Jan 21, 2024
Python

mariam222-cypro / Twitter_Data_Pipeline

Star

Real Time Streaming: Twitter Data Pipeline Using Big data Tools

big-data spark hive hadoop sentiment-analysis twitter-api pyspark spark-streaming hashtags hiveql

Updated May 8, 2023
Python

melvinjjoseph / Big-Data-UE21CS343AB2-Assignments

Star

Big Data Assignments

kafka spark hadoop hadoop-mapreduce

Updated Dec 6, 2023
Python

maniraniyal / BigData

Star

Step By Step guide for Hadoop installation on Ubuntu 16.04.3 with MapReduce example using Streaming

hadoop ubuntu occurences wordcount hadoop-mapreduce hadoop-streaming hadoop-installation hadoop-example

Updated Feb 16, 2018
Python

divithraju / divith-raju-PySpark-Projects

Star

linux data opensource web hadoop ubuntu bigdata apache project python3 pyspark hdfs software-engineering user dataprocessing dataengineering project-repository dataingestionframework movies-streaming

Updated Aug 15, 2024
Python

galanteh / JSON2Tables

Star

Simple python script to convert multiple json files into a parquet or and ORC file to be used on Hadoop.

hadoop cloudera parquet orc

Updated Dec 26, 2022
Python

arashabe / ENEA-AnomalyDetectionPipeline

Star

A PySpark-based pipeline for detecting anomalies in energy consumption using unsupervised models (PCA, Isolation Forest, LOF). The system processes raw JSON data, aggregates monthly features, and identifies anomalous PODIDs using an ensemble approach, ready for production deployment.

machine-learning hadoop pipeline-framework pyspark energy-consumption anomaly-detection university-of-bergamo

Updated Jul 15, 2025
Python

nanfengpo / cf-recommendations

Star

Collaborative Filtering based recommendation map reduce job using Yelp's mrjob

hadoop mrjob

Updated Feb 18, 2016
Python

MyUdacityProjects / HadoopMapreduce

Star

hadoop hadoop-mapreduce

Updated Apr 15, 2017
Python

udayshankar1306 / spark_way

Star

Apache Spark - From installation to performing awesome operations in Apache Spark Stack

emr spark apache-spark hadoop spark-fundamentals

Updated May 8, 2017
Python

Yiyun-Liang / Forum-Posts-Analysis

Star

MapReduce scripts for forum data analysis.

python hadoop mapreduce

Updated Feb 3, 2017
Python

kefan-zheng / DataWarehouse

Star

Data Warehouse Technology Course Project | Tongji University

mysql python hadoop etl neo4j datawarehouse

Updated Jan 2, 2022
Python

kangningyuan / Search-Engine-by-Hadoop-and-Flask

Star

This project is a technology article search engine based on Hadoop and Flask framework. Using Hadoop MapReduce to build inverted index, Flask to create a user-friendly web search interface, and HDFS to store index and database.

python search-engine flask hadoop

Updated Dec 20, 2024
Python

SaiRanjithReddyK / clickstream-analytics-spark-hadoop

Star

Clickstream Analytics Pipeline using Apache Spark and Hadoop to process 1.5M+ events with 70% batch efficiency improvement.

spark hadoop bigdata data-engineering batch-processing analytics-pipeline clickstream-analytics

Updated Apr 28, 2025
Python

Improve this page

Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop

Here are 549 public repositories matching this topic...

Raveesh1505 / BigData-Training

jeshwanthreddy13 / Yet_Another_Hadoop

furkancets / PrescreiberPipelineSpark

glinmac / hdp-tools

yashichawla / YAH

mmingalov / kc-hadoop

MartinCastroAlvarez / hadoop-hdfs-map-reduce-docker

mariam222-cypro / Twitter_Data_Pipeline

melvinjjoseph / Big-Data-UE21CS343AB2-Assignments

maniraniyal / BigData

divithraju / divith-raju-PySpark-Projects

galanteh / JSON2Tables

arashabe / ENEA-AnomalyDetectionPipeline

nanfengpo / cf-recommendations

MyUdacityProjects / HadoopMapreduce

udayshankar1306 / spark_way

Yiyun-Liang / Forum-Posts-Analysis

kefan-zheng / DataWarehouse

kangningyuan / Search-Engine-by-Hadoop-and-Flask

SaiRanjithReddyK / clickstream-analytics-spark-hadoop

Improve this page

Add this topic to your repo