Notes on Apache Spark (pyspark)
-
Updated
Mar 3, 2019 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Notes on Apache Spark (pyspark)
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Apache Spark™ and Scala Workshops
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Workshop Big Data en Español
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Dockerizing and Consuming an Apache Livy environment
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Serene Data Integration Platform
MLFlow End to End Workshop at Chandigarh University
Rails application for the Archives Unleashed Cloud.
Example applications of GDELT mass media intelligence data
Demo created for "Life is but a Stream" presentation at Spark AI Summit 2019 (San Francisco, CA)
Time series forecasting using Prophet and Apache Spark
A comprehensive implementation of Bitcoin address clustering using multiple heuristic conditions for blockchain analysis and chain analysis applications.
Taxi versus Uber in NYC
apache_spark_UCBerkeleyX
Distributed ML: Predicting Churn from Click Data with Apache Spark
Created by Matei Zaharia
Released May 26, 2014