- The Woodlands, Texas
- https://szilard.github.io/aboutme/
Stars
Advanced workshop on XGBoost with Tianqi Chen in Santa Monica, June 2, 2016
Quick informal survey at the Los Angeles Machine learning meetup about tools used for machine learning.
Advanced GBM Workshop - Budapest, Nov 2019
Data for benchm-ml, gbm-perf etc. (samples from the airline dataset)
Adaptive and automatic gradient boosting computations.
Most recent/important talks given at conferences/meetups
Szilard Pafka's short bio (to go with conference talk abstracts)
Tuning GBMs (hyperparameter tuning) and impact on out-of-sample predictions
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
Some thoughts on how to use machine learning in production
Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA
Performance of various open source GBM implementations
GBM multicore scaling: h2o, xgboost and lightgbm on multicore and multi-socket systems
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html
Inspired by David Donoho's "50 Years of Data Science" (2015) paper, I'm releasing here a course proposal draft I wrote in 2009 for a possible course of "data science".
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning al…
Materials for a short introductory/intermediate Data Science course taught in the MSc in Business Analytics program at the Central European University
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).
Size of datasets used for analytics based on 10 years of surveys by KDnuggets.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow