Big data training material
-
Updated
Jun 29, 2023 - Python
Big data training material
A distributed file system program that works like Hadoop with minor changes. A completely working program that incorporates asynchronous distribution of files and map and reduce components. It has its own command line interfaces with all the required commands.
Trying best case apache spark working environment for robust data pipelines
A mini-Hadoop clone capable of performing all DFS functionalities through a CLI
Running Map Reduce in Hadoop using Docker
Real Time Streaming: Twitter Data Pipeline Using Big data Tools
Step By Step guide for Hadoop installation on Ubuntu 16.04.3 with MapReduce example using Streaming
A PySpark-based pipeline for detecting anomalies in energy consumption using unsupervised models (PCA, Isolation Forest, LOF). The system processes raw JSON data, aggregates monthly features, and identifies anomalous PODIDs using an ensemble approach, ready for production deployment.
Apache Spark - From installation to performing awesome operations in Apache Spark Stack
This project is a technology article search engine based on Hadoop and Flask framework. Using Hadoop MapReduce to build inverted index, Flask to create a user-friendly web search interface, and HDFS to store index and database.
Clickstream Analytics Pipeline using Apache Spark and Hadoop to process 1.5M+ events with 70% batch efficiency improvement.
Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.
To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."