Skip to content
View MiguelPeralvo's full-sized avatar

Highlights

  • Pro

Block or report MiguelPeralvo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
85 stars written in Scala
Clear filter

Apache Spark - A unified analytics engine for large-scale data processing

Scala 42,268 28,928 Updated Nov 10, 2025

A platform to build and run apps that are elastic, agile, and resilient. SDK, libraries, and hosted environments.

Scala 13,230 3,579 Updated Nov 6, 2025

PredictionIO, a machine learning server for developers and ML engineers.

Scala 12,529 1,919 Updated Jan 9, 2021

CMAK is a tool for managing Apache Kafka clusters

Scala 11,938 2,500 Updated Aug 2, 2023

The leader in Customer Data Infrastructure

Scala 6,964 1,190 Updated Jun 4, 2025

Code, exercises, answers, and hints to go along with the book "Functional Programming in Scala"

Scala 5,710 3,029 Updated Dec 11, 2024

State of the Art Natural Language Processing

Scala 4,068 733 Updated Nov 10, 2025

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,538 573 Updated Nov 4, 2025

Spark: The Definitive Guide's Code Repository

Scala 3,055 2,876 Updated Aug 26, 2020

REST job server for Apache Spark

Scala 2,845 988 Updated Jul 8, 2025

Abstract Algebra for Scala

Scala 2,300 347 Updated Aug 21, 2025

Apache Spark to Apache Cassandra connector

Scala 1,947 929 Updated Apr 29, 2025

Powerful new number types and numeric abstractions for Scala.

Scala 1,776 241 Updated Oct 17, 2025

Base classes to use when writing tests with Spark

Scala 1,545 355 Updated Oct 27, 2025

Code to accompany Advanced Analytics with Spark from O'Reilly Media

Scala 1,529 1,023 Updated Sep 25, 2024

MLeap: Deploy ML Pipelines to Production

Scala 1,528 315 Updated Nov 27, 2024

Purely Functional Algorithms and Data Structures in Scala

Scala 1,481 318 Updated Aug 14, 2023

KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka f…

Scala 1,182 394 Updated Jan 5, 2017

CSV Data Source for Apache Spark 1.x

Scala 1,057 441 Updated Dec 13, 2018

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Scala 1,039 316 Updated Jul 12, 2025

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

Scala 1,036 199 Updated Nov 21, 2022

A Time Series Library for Apache Spark

Scala 1,021 183 Updated Jul 3, 2020

Sparkling Water provides H2O functionality inside Spark cluster

Scala 977 360 Updated Nov 5, 2025

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Scala 932 614 Updated Nov 10, 2025

A new Scala wrapper for Joda Time based on scala-time

Scala 869 78 Updated Nov 4, 2025

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination…

Scala 795 160 Updated Nov 6, 2025

Accompanying source code for akka in action

Scala 745 418 Updated Aug 19, 2022

Create Docker images directly from sbt

Scala 735 111 Updated Dec 12, 2024

BlinkDB: Sub-Second Approximate Queries on Very Large Data.

Scala 660 120 Updated Feb 6, 2014

Spark reference applications

Scala 652 339 Updated Oct 3, 2024
Next