A comprehensive starter kit for Apache Spark, featuring Docker-based setup and example applications demonstrating various Spark capabilities.
-
Updated
Apr 12, 2025 - Dockerfile
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
A comprehensive starter kit for Apache Spark, featuring Docker-based setup and example applications demonstrating various Spark capabilities.
Containers configuration saved from other tasks related to work or personal projects
Setting up a simple Apache Spark environment used for working with Spark in a development environment.
A robust, scalable on-premises data lake
Set-up apache spark cluster with hadoop(hdfs) and airflow on docker
The Apache Spark with Scala 2.13 Docker image is a lightweight and easy-to-use Docker image for running Apache Spark with Scala 2.13 supported on your system.
Apache Spark cluster connected to a Jupyter Notebook instance
Set-up local spark cluster, hadoop (hdfs), airflow, postgresql on docker with ease, without any local installations
Small setup of development environment for Apache Spark with docker
Dockerimage of morpheus, the project from opencypher previously known as Cypher for Apache Spark
This repository holds examples and documentation about the most used tools in the data engineering ecosystem.
Docker setup for Apache Spark and the R sparklyr package
Collection of Apache Spark docker images for OKDP
PySpark in Docker Containers
Created by Matei Zaharia
Released May 26, 2014