hadoop

A PySpark-based pipeline for detecting anomalies in energy consumption using unsupervised models (PCA, Isolation Forest, LOF). The system processes raw JSON data, aggregates monthly features, and identifies anomalous PODIDs using an ensemble approach, ready for production deployment.

machine-learning hadoop pipeline-framework pyspark energy-consumption anomaly-detection university-of-bergamo

Updated Jul 15, 2025
Python

jloaiz16 / big-data-project

Star

Proyecto #3 - Big Data de la materia tópicos especiales de telematica

python hadoop hdfs

Updated May 25, 2018
Python

mpolatcan / hadoop-docker

Star

Scalable Hadoop Docker image that can works on Docker Compose and Kubernetes

docker kubernetes scalable hadoop docker-compose docker-image

Updated Nov 4, 2020
Python

SiddharthaShandilya / automation-using-python

Star

we are using python to automate docker , Hadoop ,

linux docker aws automation hadoop python3

Updated Sep 14, 2021
Python

p-disha / NYC-Parking-Violations

Star

This is an analysis on NYC Parking Violations dataset using PySpark SparkSQL and Map Reduce to find some useful insights.

spark hadoop analysis insights pyspark sparksql mapreduce spark-sql taxi-data nyc-taxi-dataset mapreduce-python

Updated Apr 15, 2020
Python

the-timoye / map-reduce-example

Star

Describes the map-reduce concept used in Data Processing - Data Engineering

spark hadoop map-reduce data-engineering

Updated Apr 28, 2021
Python

prachi220 / Cloud-Computing

Star

Cloud Computing course projects

hadoop cloud-computing hdfs disk-virtualization

Updated Jun 7, 2020
Python

sauravpd29 / Movie-Analytics-Big-Data

Star

Movie Analytics in Big Data Using Hadoop and Spark

hadoop pyspark

Updated Nov 18, 2021
Python

raja9283 / HadoopSCD

Star

A data pipeline on GCP Dataproc using Sqoop, HDFS, Hive, and PySpark to implement SCD Type 2 for an e-commerce use case. Tracks customer and product changes (e.g., address, price) and their impact on sales, demonstrating scalable data warehousing and processing.

spark hive hadoop hdfs scd sqoop

Updated Mar 2, 2025
Python

ahmedhumza94 / udacity-intro-to-hadoop-and-mapreduce

Star

Repository containing python code for MapReduce jobs to answer questions about Udacity forum data.

python hadoop mapreduce hadoop-streaming

Updated May 26, 2018
Python

HarshitDawar55 / Multiple-Technologies-Python

Star

This repository contains a python script that can be used to solve multiple use-cases starting from os use cases like creating partitions, LVM, Hadoop tasks, AWS tasks, Docker tasks, etc.

python docker jenkins aws ansible devops cloud spark hadoop terraform bigdata os swap partitions lvm

Updated Nov 10, 2020
Python

joszamama / big-data

Star

Usage of different Big Data apps

hive hadoop pig mapreduce

Updated Mar 26, 2023
Python

Improve this page

Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop

Here are 566 public repositories matching this topic...

glinmac / hdp-tools

maniraniyal / BigData

mariam222-cypro / Twitter_Data_Pipeline

melvinjjoseph / Big-Data-UE21CS343AB2-Assignments

yashichawla / YAH

mmingalov / kc-hadoop

galanteh / JSON2Tables

divithraju / divith-raju-PySpark-Projects

arashabe / ENEA-AnomalyDetectionPipeline

jloaiz16 / big-data-project

mpolatcan / hadoop-docker

SiddharthaShandilya / automation-using-python

p-disha / NYC-Parking-Violations

the-timoye / map-reduce-example

prachi220 / Cloud-Computing

sauravpd29 / Movie-Analytics-Big-Data

raja9283 / HadoopSCD

ahmedhumza94 / udacity-intro-to-hadoop-and-mapreduce

HarshitDawar55 / Multiple-Technologies-Python

joszamama / big-data

Improve this page

Add this topic to your repo