Running Spark (and Hadoop) on Docker
-
Updated
Apr 30, 2020 - Dockerfile
Running Spark (and Hadoop) on Docker
Hadoop3-HA-Docker is a production-ready, fault-tolerant Hadoop cluster deployed with Docker Compose. It automates the setup of a fully distributed Hadoop ecosystem with high availability (HA) features, designed for reliability, scalability, and real-world big data workloads
A template repository provides convenient Apache Hadoop instance in Dev Containers.
HDFS local single node container for testing
Base Docker image for all Apache Hadoop components
Dockerfile for developing pyspark applications & libraries.
Recommendations and personalization service
Recommendations and personalization service
YARN Cluster - Hadoop, Hive, Hive Metastore and AWS Glue
Docker image for Apache Hive Metastore
Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.
To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."