Apache Spark docker image
-
Updated
Apr 21, 2023 - Shell
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Apache Spark docker image
A curated list of awesome Apache Spark packages and resources.
[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Ansible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster
An image for running Jupyter notebooks and Apache Spark in the cloud on OpenShift
Easy CPU Profiling for Apache Spark applications
A .NET for Apache Spark docker image (3rdman/dotnet-spark)
Create n-node cluster and Run spark job on Docker
Production run of Apache Spark on Kubernetes
Sample Oozie Workflow to test the Spark Job. In Workflow, we use the Shell action to call a Shell script. The Shell script will be invoking the Spark Pi example Job.
File compaction tool that runs on top of the Spark framework.
This is the material for the 2019 Silicon Valley Code Camp Session "Realish Time Predictive Analytics with Spark Structured Streaming"
📘 FIWARE 306: Real-time Processing of Context Data using Apache Spark
Sparkler Crawl Environment - a packaged, dockerized version of http://github.com/USCDataScience/sparkler.git
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
Apache Spark to run on Kubernetes
Host files and procedure for running Fink on Kubernetes
Created by Matei Zaharia
Released May 26, 2014