Stars
Best Practices on Recommendation Systems
Lecture notes and example code for teaching C & C++
Material from the Big Data course at Chicago Booth
Optimization for Data Science Course
Teaching repo for Applied Data Science @ Columbia, a project-based course for data science skills (statistical thinking, machine learning, data engineering, team work, presentation, endurance of fr…
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Quantitative research and educational materials
🐢 bayesAB: Fast Bayesian Methods for A/B Testing
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Portfolio and risk analytics in Python
Scalable Topic Modeling using Variational Inference in MapReduce
Uber trip data from a freedom of information request to NYC's Taxi & Limousine Commission
Import public NYC taxi and for-hire vehicle (Uber, Lyft) trip data into a PostgreSQL or ClickHouse database
Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filte…
C++ implementation of IRLS algorithm for generalized linear model
Class materials for a distributed systems lecture series
Some tutorial-type code to introduce map-reduce style of programming
Predicts league of legends play off games for the 2015 season
Official content for the Fall 2014 Harvard CS109 Data Science course
Lectures on scientific computing with python, as IPython notebooks.