MapReduce, Spark, Java, and Scala for Data Algorithms Book
-
Updated
Oct 14, 2024 - Java
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
A visual ETL development and debugging tool for big data
This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Operator for managing the Spark clusters on Kubernetes and OpenShift.
Spark in Action, 2nd edition - chapter 2
A converter for the OSM PBFs to Parquet files
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Apache Pulsar Adapters
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
REST API for Apache Spark on K8S or YARN
Spark in Action, 2nd edition - chapter 3
This project has customization likes custom data sources, plugins written for the distributed systems like Apache Spark, Apache Ignite etc
Spark in Action, 2nd edition - chapter 7 - Ingestion from files
Stream processing guidelines and examples using Apache Flink and Apache Spark
Spark in Action, 2nd edition - chapter 8
Spark in Action, 2nd edition - chapter 12 - Transforming your data
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
Created by Matei Zaharia
Released May 26, 2014