- The Woodlands, Texas
- https://szilard.github.io/aboutme/
Stars
scikit-learn: machine learning in Python
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K…
An implementation of the Grammar of Graphics in R
RStudio is an integrated development environment (IDE) for R
R's data.table package extends data.frame:
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning al…
useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html
Performance of various open source GBM implementations
Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).
Some thoughts on how to use machine learning in production
Adaptive and automatic gradient boosting computations.
Quick informal survey at the Los Angeles Machine learning meetup about tools used for machine learning.
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
Materials for a short introductory/intermediate Data Science course taught in the MSc in Business Analytics program at the Central European University
Advanced workshop on XGBoost with Tianqi Chen in Santa Monica, June 2, 2016
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
Tuning GBMs (hyperparameter tuning) and impact on out-of-sample predictions
Compare the scoring speed of several open source machine learning libraries.
GBM multicore scaling: h2o, xgboost and lightgbm on multicore and multi-socket systems
Latency numbers every data scientist should know (aka the pyramid of analytical tasks) - the order of magnitude of computational time for the most common analytical tasks (SQL-like data munging, li…