-
Updated
Nov 10, 2018 - Scala
apache-spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 374 public repositories matching this topic...
A Twitter Stream Processing Pipeline with ingestion, processing, storage, and visualization.
-
Updated
Jan 4, 2025 - Scala
-
Updated
Sep 16, 2025 - Scala
-
Updated
Dec 21, 2017 - Scala
-
Updated
May 26, 2017 - Scala
-
Updated
May 6, 2020 - Scala
Contains the code and examples for my article on Medium, which provides a guide to setting up and running Spark projects using Scala and sbt.
-
Updated
Oct 25, 2024 - Scala
Replicates an Insider Attack on a graph using distributed computing
-
Updated
Dec 13, 2023 - Scala
Alternating Least Squares Music Recommender
-
Updated
May 18, 2017 - Scala
A movie recommendation system built using Apache Spark and Scala.
-
Updated
Feb 24, 2017 - Scala
Yelp Image Classification Model
-
Updated
Apr 29, 2017 - Scala
-
Updated
Apr 23, 2017 - Scala
Utility for common use cases and bug workarounds in Apache Spark 2
-
Updated
Apr 25, 2019 - Scala
The RandNE (Graph embedding) algorithm implemented in Apache Spark
-
Updated
Sep 17, 2021 - Scala
This project implements a distributed pipeline for NLP model training using Apache Spark and DeepLearning4J (DL4J). The methodology utilizes a sliding window approach for data preparation, positional embeddings for token encoding, and Word2Vec model training with parallel processing. The model and training process is designed for scalability and op
-
Updated
Aug 26, 2025 - Scala
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 435 followers
- Repository
- apache/spark
- Website
- github.com/topics/spark
- Wikipedia
- Wikipedia