酷玩 Spark: Spark 源代码解析、Spark 类库等
-
Updated
May 18, 2022 - Scala
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
酷玩 Spark: Spark 源代码解析、Spark 类库等
Simple and Distributed Machine Learning
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
Project for James' Apache Spark with Scala course
Feathr – A scalable, unified data and AI engineering platform for enterprise
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Apache Spark Course Material
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
A Spark Atlas connector to track data lineage in Apache Atlas
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Apache Spark 3 - Structured Streaming Course Material
Spark in Action, 2nd edition - chapter 1 - Introduction
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
Spark Connector to read and write with Pulsar
Example Maven configuration for a Spark, Scala project
Created by Matei Zaharia
Released May 26, 2014