apache-spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 374 public repositories matching this topic...
Simple and Distributed Machine Learning
-
Updated
Dec 15, 2025 - Scala
Apache Spark Connectors for OceanBase.
-
Updated
Dec 12, 2025 - Scala
A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.
-
Updated
Dec 11, 2025 - Scala
This project helps us understand customer shopping habits by looking at e-commerce data. We use special tools to clean the data, find important information, and then save it for later use. This helps businesses make better decisions.
-
Updated
Dec 8, 2025 - Scala
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
-
Updated
Dec 6, 2025 - Scala
Spark Connector to read and write with Pulsar
-
Updated
Dec 5, 2025 - Scala
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
-
Updated
Dec 5, 2025 - Scala
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
-
Updated
Dec 16, 2025 - Scala
sparkmobility-scala: Scala Spark codesapce for sparkmobility
-
Updated
Nov 11, 2025 - Scala
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
-
Updated
Nov 6, 2025 - Scala
A recommender system for discovering GitHub repos, built with Apache Spark
-
Updated
Nov 5, 2025 - Scala
An end-to-end data engineering and analysis project to process a large-scale movie dataset, derive actionable business insights using Apache Spark, and build a content-based recommendation system.
-
Updated
Oct 31, 2025 - Scala
Students Academic Performance Analysis using Apache Spark
-
Updated
Oct 30, 2025 - Scala
An end-to-end batch data pipeline for e-commerce analytics, built with Scala, Spark, HDFS, Hive, Postgres, and Jenkins. The system is enhanced with a parallel streaming pipeline using Kafka for real-time Twitter trend analysis.
-
Updated
Oct 8, 2025 - Scala
Scalable data engineering platform for agricultural IoT sensor data with Kafka, Spark, S3, and Grafana.
-
Updated
Sep 17, 2025 - Scala
-
Updated
Sep 16, 2025 - Scala
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
-
Updated
Aug 31, 2025 - Scala
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 435 followers
- Repository
- apache/spark
- Website
- github.com/topics/spark
- Wikipedia
- Wikipedia