Highlights
- Pro
Stars
Apache Spark - A unified analytics engine for large-scale data processing
A platform to build and run apps that are elastic, agile, and resilient. SDK, libraries, and hosted environments.
PredictionIO, a machine learning server for developers and ML engineers.
CMAK is a tool for managing Apache Kafka clusters
The leader in Customer Data Infrastructure
Code, exercises, answers, and hints to go along with the book "Functional Programming in Scala"
State of the Art Natural Language Processing
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Spark: The Definitive Guide's Code Repository
REST job server for Apache Spark
Apache Spark to Apache Cassandra connector
Powerful new number types and numeric abstractions for Scala.
Base classes to use when writing tests with Spark
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Purely Functional Algorithms and Data Structures in Scala
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka f…
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Sparkling Water provides H2O functionality inside Spark cluster
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
A new Scala wrapper for Joda Time based on scala-time
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination…
Accompanying source code for akka in action
BlinkDB: Sub-Second Approximate Queries on Very Large Data.