Running Spark (and Hadoop) on Docker
-
Updated
Apr 30, 2020 - Dockerfile
Running Spark (and Hadoop) on Docker
Hadoop3-HA-Docker is a production-ready, fault-tolerant Hadoop cluster deployed with Docker Compose. It automates the setup of a fully distributed Hadoop ecosystem with high availability (HA) features, designed for reliability, scalability, and real-world big data workloads
🐳 hadoop ecosystems docker image
A template repository provides convenient Apache Hadoop instance in Dev Containers.
HDFS local single node container for testing
Base Docker image for all Apache Hadoop components
Recommendations and personalization service
Local playground for Spark and Jupyter notebooks, plus Iceberg support
BigData Pipeline is a local testing environment for experimenting with various storage solutions (RDB, HDFS), query engines (Trino), schedulers (Airflow), and ETL/ELT tools (DBT). It supports MySQL, Hadoop, Hive, Kudu, and more.
Docker Compose for hadoop on ubuntu 22.04.
Dockerfile for developing pyspark applications & libraries.
Docker image for Apache Hive Metastore
Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.
To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."