Welcome to the
course!
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
Sergey Fogelson
VP of Analytics, Viacom
Before we get to XGBoost...
Need to understand the basics of
Supervised classi cation
Decision trees
Boosting
EXTREME GRADIENT BOOSTING WITH XGBOOST
Supervised learning
Relies on labeled data
Have some understanding of past behavior
EXTREME GRADIENT BOOSTING WITH XGBOOST
Supervised learning example
Does a speci c image contain a person's face?
Training data: vectors of pixel values
Labels: 1 or 0
EXTREME GRADIENT BOOSTING WITH XGBOOST
Supervised learning: Classi cation
Outcome can be binary or multi-class
EXTREME GRADIENT BOOSTING WITH XGBOOST
Binary classi cation example
Will a person purchase the insurance package given some quote?
EXTREME GRADIENT BOOSTING WITH XGBOOST
Multi-class classi cation example
Classifying the species of a given bird
EXTREME GRADIENT BOOSTING WITH XGBOOST
AUC: Metric for binary classi cation models
EXTREME GRADIENT BOOSTING WITH XGBOOST
Accuracy score and confusion matrix
EXTREME GRADIENT BOOSTING WITH XGBOOST
Supervised learning with scikit-learn
EXTREME GRADIENT BOOSTING WITH XGBOOST
Other supervised learning considerations
Features can be either numeric or categorical
Numeric features should be scaled (Z-scored)
Categorical features should be encoded (one-hot)
EXTREME GRADIENT BOOSTING WITH XGBOOST
Ranking
Predicting an ordering on a set of choices
EXTREME GRADIENT BOOSTING WITH XGBOOST
Recommendation
Recommending an item to a user
Based on consumption history and pro le
Example: Net ix
EXTREME GRADIENT BOOSTING WITH XGBOOST
Let's practice!
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
Introducing XGBoost
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
Sergey Fogelson
VP of Analytics, Viacom
What is XGBoost?
Optimized gradient-boosting machine learning library
Originally written in C++
Has APIs in several languages:
Python
Scala
Julia
Java
EXTREME GRADIENT BOOSTING WITH XGBOOST
What makes XGBoost so popular?
Speed and performance
Core algorithm is parallelizable
Consistently outperforms single-algorithm methods
State-of-the-art performance in many ML tasks
EXTREME GRADIENT BOOSTING WITH XGBOOST
Using XGBoost: a quick example
import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
class_data = pd.read_csv("classification_data.csv")
X, y = class_data.iloc[:,:-1], class_data.iloc[:,-1]
X_train, X_test, y_train, y_test= train_test_split(X, y,
test_size=0.2, random_state=123)
xg_cl = xgb.XGBClassifier(objective='binary:logistic',
n_estimators=10, seed=123)
xg_cl.fit(X_train, y_train)
preds = xg_cl.predict(X_test)
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
print("accuracy: %f" % (accuracy))
accuracy: 0.78333
EXTREME GRADIENT BOOSTING WITH XGBOOST
Let's begin using
XGBoost!
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
What is a decision
tree?
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
Sergey Fogelson
VP of Analytics, Viacom
Visualizing a decision tree
1 https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/
com.ibm.spss.modeler.help/nodes_treebuilding.htm
EXTREME GRADIENT BOOSTING WITH XGBOOST
Decision trees as base learners
Base learner - Individual learning algorithm in an ensemble
algorithm
Composed of a series of binary questions
Predictions happen at the "leaves" of the tree
EXTREME GRADIENT BOOSTING WITH XGBOOST
Decision trees and CART
Constructed iteratively (one decision at a time)
Until a stopping criterion is met
EXTREME GRADIENT BOOSTING WITH XGBOOST
Individual decision trees tend to over t
1 http://scott.fortmann 2 roe.com/docs/BiasVariance.html
EXTREME GRADIENT BOOSTING WITH XGBOOST
Individual decision trees tend to over t
1 http://scott.fortmann 2 roe.com/docs/BiasVariance.html
EXTREME GRADIENT BOOSTING WITH XGBOOST
CART: Classi cation and Regression Trees
Each leaf always contains a real-valued score
Can later be converted into categories
EXTREME GRADIENT BOOSTING WITH XGBOOST
Let's work with some
decision trees!
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
What is Boosting?
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
Sergey Fogelson
VP of Analytics, Viacom
Boosting overview
Not a speci c machine learning algorithm
Concept that can be applied to a set of machine learning models
"Meta-algorithm"
Ensemble meta-algorithm used to convert many weak learners into
a strong learner
EXTREME GRADIENT BOOSTING WITH XGBOOST
Weak learners and strong learners
Weak learner: ML algorithm that is slightly better than chance
Example: Decision tree whose predictions are slightly better
than 50%
Boosting converts a collection of weak learners into a strong
learner
Strong learner: Any algorithm that can be tuned to achieve good
performance
EXTREME GRADIENT BOOSTING WITH XGBOOST
How boosting is accomplished
Iteratively learning a set of weak models on subsets of the data
Weighing each weak prediction according to each weak learner's
performance
Combine the weighted predictions to obtain a single weighted
prediction
... that is much better than the individual predictions themselves!
EXTREME GRADIENT BOOSTING WITH XGBOOST
Boosting example
1 https://xgboost.readthedocs.io/en/latest/model.html
EXTREME GRADIENT BOOSTING WITH XGBOOST
Model evaluation through cross-validation
Cross-validation: Robust method for estimating the performance
of a model on unseen data
Generates many non-overlapping train/test splits on training data
Reports the average test set performance across all data splits
EXTREME GRADIENT BOOSTING WITH XGBOOST
Cross-validation in XGBoost example
import xgboost as xgb
import pandas as pd
churn_data = pd.read_csv("classification_data.csv")
churn_dmatrix = xgb.DMatrix(data=churn_data.iloc[:,:-1],
label=churn_data.month_5_still_here)
params={"objective":"binary:logistic","max_depth":4}
cv_results = xgb.cv(dtrain=churn_dmatrix, params=params, nfold=4,
num_boost_round=10, metrics="error", as_pandas=True)
print("Accuracy: %f" %((1-cv_results["test-error-mean"]).iloc[-1]))
Accuracy: 0.88315
EXTREME GRADIENT BOOSTING WITH XGBOOST
Let's practice!
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
When should I use
XGBoost?
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T
Sergey Fogelson
VP of Analytics, Viacom
When to use XGBoost
You have a large number of training samples
Greater than 1000 training samples and less 100 features
The number of features < number of training samples
You have a mixture of categorical and numeric features
Or just numeric features
EXTREME GRADIENT BOOSTING WITH XGBOOST
When to NOT use XGBoost
Image recognition
Computer vision
Natural language processing and understanding problems
When the number of training samples is signi cantly smaller than
the number of features
EXTREME GRADIENT BOOSTING WITH XGBOOST
Let's practice!
EX TREME GRA DIEN T BOOS TIN G W ITH X GBOOS T