0% found this document useful (0 votes)

9 views21 pages

Lecture18 Boosting

Uploaded by

yitongwu766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views21 pages

Lecture18 Boosting

Uploaded by

yitongwu766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Boosting and Mirror Descent

George Lan

A. Russell Chandler III Chair Professor

H. Milton Stewart School of Industrial & Systems
Engineering
Rationale: Combination of methods
There is no algorithm that is always the most accurate

We can select simple “weak” classification or regression

methods and combine them into a single “strong” method

Different learners use different

Algorithms, Hyperparameters, Representations, Training sets,
Subproblems

Previously: Ensemble:

Problem
Problem

… ...
Learner … ...
Learner Learner Learner
Example of Weak Learners: Decision stump
Let 𝑦 ∈ ±1

Decision stump:

ℎ 𝑥; 𝜃! = 𝑠𝑖𝑔𝑛(𝑤! 𝑥! + 𝑏! )

Each decision stump pays

attention to only a single 𝑏!
dimension of the
input vector
Boosting
Boosting: general methods of converting rough rules of thumb
(or weak classifier) into highly accurate prediction rule

A family of methods which produce a sequence of classifiers

Each classifier is dependent on the previous one and focuses on
the previous one’s errors
Examples that are incorrectly predicted in the previous classifiers
are chosen more often or weighted more heavily when
estimating a new classifier.

Questions:
How to choose “hardest” examples?
How to combine these classifiers?

4
Boosting Setup
Given a set of base classifiers 𝐻 = ℎ# ,… , ℎ% , where
ℎ& : 𝑋 → 1, −1 .
Training data (examples):(𝑥& , 𝑦& ), i = 1, … , 𝑚. Here 𝑥& ∈ 𝑋
and 𝑦& ∈ 1, −1 .

To construct
a sequence of distributions {𝑤! }
a sequence {𝐻! } of nonnegative combinations of base classifiers
𝐻! performs significantly better than any base classifier in 𝐻

5
Matrix notation
For notation convenience, let us define the feature matrix
𝐴&! : = 𝑦& ℎ! (𝑥& ).
Intuitively, the (𝑖, 𝑗)-th entry of the matrix 𝐴 represents the
effectiveness of the base classifier ℎ" , 𝑗 = 1, … , 𝑛, applied to the
example (𝑥# , 𝑦# ).

6
Formulation
Our goal is to choose the nonnegative weights 𝜆 to form an
improved classifier that maximizes the worst-case margin.
Margin for example i:
𝐴𝜆 # = 3(𝜆" 𝐴#" ) = 3[𝜆" 𝑦# ℎ" (𝑥# )] = 𝑦# 3[𝜆" ℎ" (𝑥# )]
" " "
Worst case margin:
𝑝 𝜆 ≔ 𝑚𝑖𝑛&'#,…,) (𝐴𝜆)& = min{𝑤 * 𝐴𝜆: 𝑤 ∈ Δ) }.
Δ) = {𝑤| ∑) &'# 𝑤& = 1, 𝑤& ≥ 0}
The optimization problem is defined as
𝑚𝑎𝑥+,- 𝑝 𝜆

7
Optimization problem
𝑚a𝑥+,- 𝑝 𝜆
𝑝 𝜆 ≔ 𝑚𝑖𝑛&'#,…,) (𝐴𝜆)& =min{𝑤 * 𝐴𝜆: 𝑤 ∈ Δ) }

Note that 𝑝 𝜆 is positively homogenous, i.e.,

𝑝 𝑎𝜆 = 𝑎𝑝 𝜆 ,a ≥ 0

It makes sense to normalize 𝜆 so that

𝜆 ∈ Δ$ = {𝜆| ∑'
#%& 𝜆# = 1, 𝜆# ≥ 0}

This is called a bilinear matrix game problem in optimization.

𝑚𝑎𝑥(∈*" {𝑝 𝜆 : = 𝑚𝑖𝑛+∈*# 𝑤 , 𝐴𝜆}

8
Saddle point formulation and duality
Original Problem
𝑚𝑎𝑥(∈*" {𝑝 𝜆 : = 𝑚𝑖𝑛+∈*# 𝑤 , 𝐴𝜆}

By duality
𝑚𝑖𝑛+∈*# {𝑓 𝑤 : = 𝑚𝑎𝑥(∈*" 𝑤 , 𝐴𝜆}

How to solve the problem?

Subgradient descent
Mirror descent => Boosting

9
Subgradient computation --- Weak Learner
𝑚𝑖𝑛.∈0! {𝑓 𝑤 : = 𝑚𝑎𝑥+∈0" 𝑤 * 𝐴𝜆}
Subgradient of 𝑓 : g 𝑤 = 𝐴.! ∗
𝑗 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥!∈{#,…,%} ∑)
&'# 𝑤& 𝐴&!
)

= 𝑎𝑟𝑔𝑚𝑎𝑥!∈{#,…,%} N 𝑤& 𝑦& ℎ! (𝑥& )

&'#
𝜆 𝑤 = 𝑒! ∗ , i.e., the 𝑗 ∗ -th entry is 1, and others are zeros
Weak learner W:
For any distribution 𝑤 on the examples,
Return an index 𝑗 ∗ of the base classifier ℎ" ∗ that does best on the
weighted example determined by 𝑤
𝑗 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥"∈{&,…,$} ∑' #%& 𝑤# 𝑦# ℎ" (𝑥# ),

10
Subgradient descent
Subgradient descent algorithm
K = 1, ….
'
𝑤!"# = 𝑎𝑟𝑔𝑚𝑖𝑛$∈&! 𝑤! − 𝛾! 𝑔 𝑤! − 𝑤
∑%
"#$ )" * $"
𝜆!"# =
∑%
"#$ )"

Subgradient g 𝑤 = (𝑦& ℎ! ∗ 𝑥& ), If example is misclassified,

the corresponding entry in g 𝑤 is negative, resulting a weight
increase in the next iteration.

Note: the subproblem of computing 𝑤56# is relatively easy but

we do not have an explicit solution.

11
Mirror descent
# 7
𝑤56# = 𝑎𝑟𝑔𝑚𝑖𝑛.∈0! 7
𝑤5 − 𝛾5 𝑔 𝑤5 − 𝑤
#
= 𝑎𝑟𝑔𝑚𝑖𝑛.∈0! 𝛾5 𝑔 𝑤5 * 𝑤 + 7 𝑤 − 𝑤5 7

# 7
We can replace the term 7
𝑤 − 𝑤5 by a more general
“distance-like” function.

In particular, since Δ) is a simplex, we replace it by

V 𝑤5 , 𝑤 ≔ 𝑣 𝑤 − 𝑣 𝑤5 − 𝑣 8 𝑤5 * (𝑤 − 𝑤5 )
v 𝑤 = ∑) &'# 𝑤& 𝑙𝑛 𝑤&

12
Mirror Descent èBoosting
The Mirror Descent (or Boosting) Algorithm
K = 1, ….
+
𝑤!"# = 𝑎𝑟𝑔𝑚𝑖𝑛$∈&! 𝛾! 𝑔 𝑤! 𝑤+ V 𝑤! , 𝑤
∑%
"#$ )" * $"
𝜆!"# = ∑%
"#$ )"

How should we solve the subproblem?

𝑎𝑟𝑔𝑚𝑖𝑛+∈*# [𝛾! 𝑔 𝑤! − 𝑣′(𝑤! )], 𝑤+ v 𝑤
&
𝑣 2 𝑤! = [1, … , 1], +
+%

13
A previous homework problem

14
Base classifiers with continuous parameters
We had assumed that the base classifiers ℎ! are given and
fixed. How about they are parameterized by some continuous
parameter 𝜃! , i.e., ℎ! 𝑥& , 𝜃! .
Difficult to list all possible values of 𝜃"
Replace the step for computation of subgradient by
)
(𝑗 ∗ , 𝜃! ∗ ) = 𝑎𝑟𝑔𝑚𝑎𝑥!∈ #,…,% ,9$ N 𝑤& 𝑦& ℎ! (𝑥& , 𝜃! )
&'#
g 𝑤 = (𝑦& ℎ! ∗ 𝑥& , 𝜃! ∗ )
h 𝑤 = ℎ! ∗ (. , 𝜃! ∗ )
If example is misclassified, the corresponding entry in g 𝑤 is
negative, resulting a weight increase in the next iteration.
The process of finding h 𝑤 is sometimes called training the
weak learner. 15
Adaptive Boosting (AdaBoost)
K = 1, ….
Train a weak learner at 𝑤! to obtain g 𝑤! and h 𝑤!
,
𝑤!3& = 𝑎𝑟𝑔𝑚𝑖𝑛+∈*# − 𝛾! 𝑔 𝑤! 𝑤+ 𝐷 𝑤! , 𝑤
∑%
&'( 5& 6 +&
𝐻!3& = ∑%
&'( 5&

16
Toy Example
Weak classifier: vertical or horizontal half-planes (or decision
stump) , parameterized by one threshold value
Let 𝑦 ∈ ±1
Uniform weights on all examples
𝑤#

17
Boosting round 1
Choose a decision stump (weak classifier)
Some data points obtain higher weights because they are
classified incorrectly
ℎ(𝑤# ) 𝑤'

𝛾) = 0.42

18
Boosting round 2
Choose a new decision stump
Reweight again. For incorrectly classified examples, weight
increased
ℎ(𝑤' ) 𝑤,

𝛾* = 0.65

19
Boosting round 3
Repeat the same process
Now we have 3 classifiers

ℎ(𝑤, )

𝛾* = 0.95

20
Boosting aggregate classifier
Final classifier is weighted combination of weak classifiers

0.42+0.65+0.92

Boosting
No ratings yet
Boosting
31 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
Boosting
No ratings yet
Boosting
11 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
Boosting Mit
No ratings yet
Boosting Mit
36 pages
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
No ratings yet
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
29 pages
Boosting Algorithms Explained
No ratings yet
Boosting Algorithms Explained
79 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
Boosting and Applications Yuan
No ratings yet
Boosting and Applications Yuan
41 pages
Lecture 10 Boosting
No ratings yet
Lecture 10 Boosting
20 pages
Survey - Gradient Boosting Machine
No ratings yet
Survey - Gradient Boosting Machine
9 pages
Boosting Reduces Bias
No ratings yet
Boosting Reduces Bias
7 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Lecture Notes 7
No ratings yet
Lecture Notes 7
8 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
Chapter 3 - Boosting Theory
No ratings yet
Chapter 3 - Boosting Theory
7 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Slide07 Haykin Chapter 7: Committee Machines
No ratings yet
Slide07 Haykin Chapter 7: Committee Machines
8 pages
Artificial Intelligence Fundamentals: Learning: Boosting
No ratings yet
Artificial Intelligence Fundamentals: Learning: Boosting
24 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
LECTURE+NOTES Boosting
No ratings yet
LECTURE+NOTES Boosting
8 pages
Module 3
No ratings yet
Module 3
26 pages
Linear Regression For Machine Learning Course
No ratings yet
Linear Regression For Machine Learning Course
41 pages
Lec5 Boosting v2.7 1
No ratings yet
Lec5 Boosting v2.7 1
46 pages
Boosting Algorithms in Machine Learning
100% (1)
Boosting Algorithms in Machine Learning
41 pages
ch6 (Q 2,8,4)
No ratings yet
ch6 (Q 2,8,4)
9 pages
Boosting Approach To Machine Learn
No ratings yet
Boosting Approach To Machine Learn
23 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
An Introduction To Boosting and Leveraging: 1 A Brief History of Boosting
No ratings yet
An Introduction To Boosting and Leveraging: 1 A Brief History of Boosting
66 pages
Machine Learning Basics Explained
No ratings yet
Machine Learning Basics Explained
10 pages
Boosting and Classifier Complexity Lecture
No ratings yet
Boosting and Classifier Complexity Lecture
10 pages
1 Eric Boosting304FinalRpdf
No ratings yet
1 Eric Boosting304FinalRpdf
19 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
ML 14 Boosting
No ratings yet
ML 14 Boosting
57 pages
Session 5
No ratings yet
Session 5
36 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Gradient Boosting
No ratings yet
Gradient Boosting
32 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Foundations of Machine Learning: Courant Institute and Google Research
No ratings yet
Foundations of Machine Learning: Courant Institute and Google Research
42 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Adaboost: Derek Hoiem March 31, 2004
No ratings yet
Adaboost: Derek Hoiem March 31, 2004
46 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
02 Lecturenote GD
No ratings yet
02 Lecturenote GD
10 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Rise of The Machines - Application of Machine Learning - Schultz - Fabozzi
No ratings yet
Rise of The Machines - Application of Machine Learning - Schultz - Fabozzi
14 pages
Airline Satisfaction Prediction
No ratings yet
Airline Satisfaction Prediction
16 pages
Stacked Ensemble Learning Based Approach For Anomaly Detection in IoT Environment
No ratings yet
Stacked Ensemble Learning Based Approach For Anomaly Detection in IoT Environment
6 pages
Machine Learning May 2024
No ratings yet
Machine Learning May 2024
8 pages
Computer vision: theory, algorithms, practicalities Fifth Edition Davies Full Access
100% (2)
Computer vision: theory, algorithms, practicalities Fifth Edition Davies Full Access
57 pages
Intraday Market Preditability. A Machine Learning Approach
No ratings yet
Intraday Market Preditability. A Machine Learning Approach
56 pages
Data Analytics For Non-Life Insurance Pricing
No ratings yet
Data Analytics For Non-Life Insurance Pricing
240 pages
Machine Learning Classification of Price Extrema B
No ratings yet
Machine Learning Classification of Price Extrema B
25 pages
Chapter 8 - 1 Machine Learning
No ratings yet
Chapter 8 - 1 Machine Learning
167 pages
Early Warning of Low Visibility Using The Ensembli
No ratings yet
Early Warning of Low Visibility Using The Ensembli
15 pages
DWDM-unit-4 Ch-8
No ratings yet
DWDM-unit-4 Ch-8
29 pages
Batch-4 Idp
No ratings yet
Batch-4 Idp
52 pages
Prediction and Reliability Analysis of Shear Stren
No ratings yet
Prediction and Reliability Analysis of Shear Stren
16 pages
Machine Learning Refined (Foundations, Algorithms, and Applications) (2nd Edition) Watt
No ratings yet
Machine Learning Refined (Foundations, Algorithms, and Applications) (2nd Edition) Watt
10 pages
Dicu Bogdan
No ratings yet
Dicu Bogdan
19 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Major Project Final TABLE DIAGRAM
No ratings yet
Major Project Final TABLE DIAGRAM
28 pages
Machine Learning MCQ 1000
No ratings yet
Machine Learning MCQ 1000
229 pages
Hgfdsa
No ratings yet
Hgfdsa
17 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
13 pages
Customer Churn Prediction Using Machine Learning Algorithms
No ratings yet
Customer Churn Prediction Using Machine Learning Algorithms
6 pages
Benavides Hernández Dumeignil 2024 From Characterization To Discovery Artificial Intelligence Machine Learning and High
No ratings yet
Benavides Hernández Dumeignil 2024 From Characterization To Discovery Artificial Intelligence Machine Learning and High
31 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
Beyond Trend Following Deep Learning For Market Trend Prediction
No ratings yet
Beyond Trend Following Deep Learning For Market Trend Prediction
24 pages
Bhandari2020 Paper
No ratings yet
Bhandari2020 Paper
24 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
300 pages
Hardware Implementation For Lower Limb Surface EMG Measurement and Analysis Using Explainable AI For Activity Recognition
No ratings yet
Hardware Implementation For Lower Limb Surface EMG Measurement and Analysis Using Explainable AI For Activity Recognition
9 pages
Predicting Hourly Boarding Demand of Bus Passengers 3.6.2
100% (1)
Predicting Hourly Boarding Demand of Bus Passengers 3.6.2
81 pages
AdaBoost Based Bankruptcy Forecasting of Korean Constr 2014 Applied Soft Com
No ratings yet
AdaBoost Based Bankruptcy Forecasting of Korean Constr 2014 Applied Soft Com
6 pages
Predictive Models for I4.0 Unemployment
No ratings yet
Predictive Models for I4.0 Unemployment
23 pages

Lecture18 Boosting

Uploaded by

Lecture18 Boosting

Uploaded by

Boosting and Mirror Descent

A. Russell Chandler III Chair Professor

We can select simple “weak” classification or regression

Different learners use different

Each decision stump pays

A family of methods which produce a sequence of classifiers

Note that 𝑝 𝜆 is positively homogenous, i.e.,

It makes sense to normalize 𝜆 so that

This is called a bilinear matrix game problem in optimization.

How to solve the problem?

= 𝑎𝑟𝑔𝑚𝑎𝑥!∈{#,…,%} N 𝑤& 𝑦& ℎ! (𝑥& )

Subgradient g 𝑤 = (𝑦& ℎ! ∗ 𝑥& ), If example is misclassified,

Note: the subproblem of computing 𝑤56# is relatively easy but

In particular, since Δ) is a simplex, we replace it by

How should we solve the subproblem?

You might also like