apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-spark

Here are 554 public repositories matching this topic...

khalidmammadov / polarspark

gbusch / gbusch.github.io

mlflow / mlflow

jmcurbelo / pyspark-ingenieria-de-datos

antonisbaro / spark-los-angeles-analytics

Ratnesh-181998 / Realtime-Data-Pipeline-Airflow-Kafka-Spark-Cassandra-Docker

GarretGallo / IoTAnalyticsPipeline

milagjurovska / Mining-of-Massive-Datasets

astrolabsoftware / fink-broker

guidok91 / spark-structured-streaming-kafka

dsgrid / dsgrid

Taash1M / Taashi_Github

kubeflow / mcp-apache-spark-history-server

SirineArfa / Real-Time-Flight-info-Data-Pipeline

AmadeusITGroup / PyDataIO

Peippo1 / marketing-analytics-pipeline

oguzhangoktas / basel-rwa-pipeline

GoogleCloudPlatform / dataproc-templates

oguzhangoktas / distributed-data-sync-engine

oguzhangoktas / aml-transaction-monitoring

Related topics