FMIPA Public Lecture
Membangun
Model Prediktif Super,
Mungkinkah?
Bagus Sartono
Departemen Statistika FMIPA
Collaborators:
Dr. Eng. Annisa 21 Nov 2019
Gerry Alfa Dito, SSi Auditorium FMIPA – IPB University
Bagus Sartono
• Dosen di Departemen Statistika –
FMIPA IPB University
• Koordinator Working Group Data
Mining – FMIPA IPB University
• Wakil Ketua FORSTAT (Forum
Penyelenggara Pendidikan Tinggi
Statistika)
Apa yang Anda pikirkan
tentang model yang super?
definitely not these ones!
Predictive Analytics
Predictive analytics is the branch of
advance analytics which is used to make
prediction about unknown future events.
(PAT Research)
Predictive analytics is the use of data,
statistical algorithms and machine
learning techniques to identify the
likelihood of future outcomes based on
historical data. (SAS)
Predictive analytics is a category of data
analytics aimed at making predictions
about future outcomes based on
historical data and analytics techniques
such as statistical modeling and machine
learning. (John Edward, cio.com)
Predictive Analytics
in Business
• Scoring model to predict
the risk level of debtors
CREDIT scoring • Classification model
involving predictors: socio-
demographical variables,
historical payment, other
transaction records
• Scores
• Good/Excellent Risk
• Bad/Poor Risk
• Common algorithms:
• Logistic Regression
• Classification Tree
6
Predictive Analytics
in Business
• Propensity model to predict the likelihood-to-buy of individuals
• Up-Sell / Cross-Sell campaign
• Selective campaign
• High propensity give the offering
• Low propensity no offer
• Common algorithms: Random Forest, Boosted Tree
7
Predictive Analytics
in Business
• Identifying the Debit/Credit Card
probability of dormant
cards to be active activation
• Recall Campaign to the
prospective active card
holder
• Common Algorithm:
• k-Nearest Neighbor
8
Contoh Lainnya
• Prediksi keberhasilan studi mahasiswa
• Prediksi resiko penyakit
• Prediksi cuaca
Common
Classification Model
Algorithms
Logistic Regression Classification Tree Support Vector Machine Random Forest
Neural Network Bayesian Classifier k-Nearest Neighbor Boosting
Model Prediktif Dambaan
Memiliki Ketepatan Sederhana
Prediksi yang Tinggi
Strategi Umum
• VARIABLE SELECTION
• Mengurangi banyaknya prediktor, mengurangi banyaknya
parameter model, menghindari model yang kompleks
• FEATURE ENGINEERING
• Membuat prediktor baru yang lebih prediktif
• ENSEMBLE LEARNING
• Menggabungkan prediksi dari beberapa model/algoritma
berbeda meningkatkan ketepatan prediksi
Super Algorithm
Memiliki berbagai fitur untuk
menghasilkan model yang baik:
seleksi variabel, feature
engineering, ensemble learning
Bekerja dengan baik meskipun
pada ill-conditioned data
Tidak overfit, memiliki
kemampuan prediksi yang baik
pada data lain
“senjata” pada beberapa algoritma
pemodelan prediktif
Algoritma Pemodelan Variable Feature Ensemble
Selection Engineering
Regresi Logistik - - -
K Nearest Neighbor - - -
Classification Tree Baik Cukup -
Support Vector Machine - Baik -
Random Forest Cukup Baik Baik
Boosted Tree Baik Cukup Baik
Neural Network - Baik Baik
Ide dasar “Super Learner”
• van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2007) Super Learner.
Statistical Applications of Genetics and Molecular Biology, 6, article 25.
• Polley EC, van der Laan MJ (2010) Super Learner in Prediction. U.C.
Berkeley Division of Biostatistics Working Paper Series. Paper 226.
• STACKING
• menjadikan prediksi dari berbagai model dasar sebagai prediktor
bagi model metalearner
Algoritma Super Learner
CROSS
VALIDATION
FEATURE
ENGINEERING
BASE PREDICTIONS
DATASET LEARNERS META FINAL
LEARNER PREDICTION
VARIABLE ENSEMBLE
SELECTION
https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html
Success Story Empiris
Rata-Rata Peringkat Ketepatan Prediksi berbagai Algoritma melalui proses
validasi silang menggunakan delapan dataset berbeda
Super Learner 1.9
Conditional Forest 4.1
Glm Boost 4.4
Random Forest 5.0
Logistic Regression 5.6
Extra Trees 5.6
Ada Boost 5.8
Naïve Bayes 8.5
Gaussian Process 9.5
Xgboost 10.5
SVM 11.1
CART 11.8
Conditional Tree 11.8
C50 13.9
J48 15.1
Evolutionary Tree 15.9
IBk 16.3
Neural Network 16.3
OneR 17.1
Penutup
• Kebutuhan prediksi ada dimana-mana
• Analis memerlukan algoritma penyusunan model
prediksi yang mampu menghasilkan model super
• Pendekatan super learner bisa menjadi pilihan
karena dilengkapi dengan berbagai senjata
• Selamat mencoba!
terima kasih
bagusco@apps.ipb.ac.id
Preface Slide