hadoop
Here are 566 public repositories matching this topic...
Step By Step guide for Hadoop installation on Ubuntu 16.04.3 with MapReduce example using Streaming
-
Updated
Feb 16, 2018 - Python
Real Time Streaming: Twitter Data Pipeline Using Big data Tools
-
Updated
May 8, 2023 - Python
A mini-Hadoop clone capable of performing all DFS functionalities through a CLI
-
Updated
Dec 6, 2021 - Python
-
Updated
Aug 15, 2024 - Python
A PySpark-based pipeline for detecting anomalies in energy consumption using unsupervised models (PCA, Isolation Forest, LOF). The system processes raw JSON data, aggregates monthly features, and identifies anomalous PODIDs using an ensemble approach, ready for production deployment.
-
Updated
Jul 15, 2025 - Python
Scalable Hadoop Docker image that can works on Docker Compose and Kubernetes
-
Updated
Nov 4, 2020 - Python
Describes the map-reduce concept used in Data Processing - Data Engineering
-
Updated
Apr 28, 2021 - Python
Cloud Computing course projects
-
Updated
Jun 7, 2020 - Python
A data pipeline on GCP Dataproc using Sqoop, HDFS, Hive, and PySpark to implement SCD Type 2 for an e-commerce use case. Tracks customer and product changes (e.g., address, price) and their impact on sales, demonstrating scalable data warehousing and processing.
-
Updated
Mar 2, 2025 - Python
Repository containing python code for MapReduce jobs to answer questions about Udacity forum data.
-
Updated
May 26, 2018 - Python
This repository contains a python script that can be used to solve multiple use-cases starting from os use cases like creating partitions, LVM, Hadoop tasks, AWS tasks, Docker tasks, etc.
-
Updated
Nov 10, 2020 - Python
Improve this page
Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."