0% found this document useful (0 votes)
100 views157 pages

Lecture1 SML-I Merged

Uploaded by

biswasarno75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views157 pages

Lecture1 SML-I Merged

Uploaded by

biswasarno75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 157

CSE711

Symbolic Machine Learning I

Dr. Md. Golam Rabiul Alam


BRACU University
1
Lecture Outline

• Machine Learning
• Machine Learning Applications
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Symbolic Machine Learning
• Conclusion

2
Machine Learning

Machine learning is an application


of artificial intelligence (AI) that provides
systems the ability to automatically learn and
improve from experience without being
explicitly programmed.

3
What is Machine Learning?

4
What is Machine Learning?

Study of algorithms that


– improve their performance
– at some task
– with experience

5
What is Machine Learning?

From Data to Understanding …

Machine Learning in Action

6
Machine Learning in Action

• Decoding thoughts from brain scans

7
Machine Learning in Action

• Stock Market Prediction

8
Machine Learning in Action

• Document Classification

9
Machine Learning in Action

10
Machine Learning in Action

• Cars navigation on their own

The self-driving SUV 1st place in the


DARPA Urban Challenge

11
Machine Learning in Action

• Helicopters can learn aerial tricks by watching


other helicopters perform the stunts first

12
Machine Learning in Action

• Many, many more…


– Speech recognition, Natural language processing
– Computer vision
– Medical outcomes analysis
– Sensor networks
– Social networks
– …

13
ML is trending!

• Wide applicability
• Very large-scale complex systems
– Internet (billions of nodes), sensor network…
• Huge multi-dimensional data sets
– 30000 genes x 10000 drugs x 100 species x …
• Software too complex to write by hand
• Demand for self-customization to user,
environment

14
ML is not isolated!

15
Machine Learning Tasks

Broad categories-
• Supervised learning: classification, regression
• Unsupervised learning: Clustering, density estimation
• Semi-supervised learning
• Online learning
• Reinforcement learning
• Many more…

16
Supervised Learning

17
Supervised Learning - Classification

18
Supervised Learning - Regression

19
Supervised Learning Problems

20
Supervised Learning Problems

21
Supervised Learning Problems

22
Supervised Learning

• Data: A set of data records (also called


examples, instances or cases) described by
– k attributes: A1, A2, … Ak.
– a class: Each example is labelled with a pre-
defined class.
• Goal: To learn a classification/regression
model from the data that can be used to
predict the classes/values of new (future, or
test) cases/instances.
23
Unsupervised Learning

• Learning without a teacher

24
Unsupervised Learning - Density Estimation

25
Unsupervised Learning - Clustering

• Group similar things e.g. images

26
Unsupervised Learning - Clustering

• The data set has three natural groups of data


points, i.e., 3 natural clusters.

27
Symbolic ML

 The ML methods, which represents


learned knowledge in a declarative
symbolic form is called Symbolic ML.
Example: Decision trees, Random forests,
K-NN, and RL
 The ML methods, which represents
learned knowledge more numerically
oriented statistical or neural network form is
called Non-symbolic ML. Example: NB,
SVM, NN.
28
Approaches of Symbolic ML
 Top-down induction based learning: Decision
trees
 Instance based learning: Categorize new examples
based on the similarity of existing instance.
Example: K-NN
 Rule induction based learning: learning through
propositional and 1st order logic.

 Reinforcement learning
29
What this course is about

• Covers a wide range of ML techniques


– from basic to state-of-the-art
• You will learn about the methods you heard about
– Decision trees, Random forests, Naive Bayes, logistic
regression, k-nearest-neighbor, boosting,
dimensionality reduction, PCA, SVMs, kernels, k-
means, EM, HMMs, semi-supervised learning,
graphical models, reinforcement learning.
• Covers algorithms, theory and applications

30
Reference Books

• Machine Learning, Tom Mitchell, McGraw


Hill, 1997
• Sebastian Raschka, Vahid Mirjalili - Python
Machine Learning. Machine Learning and
Deep Learning with Python, scikit-learn and
Tensor Flow (2017, Packt)
• Stuart Russell and Peter Norvig: Artificial
Intelligence: A Modern Approach, Prentice
Hall 2010
31
Decision Making through Random
Forest

Md. Golam Rabiul Alam


Associate Professor, BRAC University
Random Forest
Random forest is a decision tree based non-
linear machine learning model for classification,
regression and feature selection.
Random Forest
 The word “Random” is for random selection of data
instances, which is known as bootstrapping method in
statistics an ML as well.

 The word “Forest” is for using several decision trees in


developing decision models through bagging method.
Random Forest
GINI Impurity:

The GINI Impurity of a node is the probability that a randomly chosen


sample in a node would be incorrectly labeled if it was labeled by the
distribution of samples in the node.

The GINI impurity can be computed by summing the probability pi of an


item with label i being chosen times the probability  p  1  p of a mistake
k i
k i

in categorizing that item.

It reaches its minimum (zero) when all cases in the node fall into a single
target category.
Random Forest
Random Forest

Find the GINI impurity from the given date?


Random Forest
Random Forest
 How to split the root node? Which splitting is better?
Random Forest
Random Forest
Steps in Random Forest Classification Method:
 1. Bootstrapping for random data subset generation
 2. Decision tree construction for each of the data subset
 i) Determination of GINI impurity of each of the features.
 Ii) Determination of GINI impurity of prospective splitting
sub-tree
 Iii) Construction of Decision tree based on the splitting
GINI impurity (i.e. if sum of the GINI impurity of splitted
sub-tree is lower than the GINI impurity of parent node
then split the parent node)
 3. Bagging for ensemble classification
 4. Majority voting for classification decision making.
Implement Random forest on the given dataset
Day Outlook Temparature Humidity Wind Play Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Bootstrapped Dataset 1
Day Outlook Temparature Humidity Wind Play Tennis
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Day2 Sunny Hot High Strong No

Create decision trees using random subset of variables


or columns [ Here, we considered only 2 columns
randomly]
Day Temparature Humidity Play Tennis
Day10 Mild Normal Yes
Day11 Mild Normal Yes
Day12 Mild High Yes
Day13 Hot Normal Yes
Day14 Mild High No
Day2 Hot High No
Calculations
Temperature Humidity
Mild [Yes: 3, No: 1] High [Yes: 1, No: 2]
Hot [Yes: 1, No: 1] Normal [Yes: 3, No: 0]
GINI(Temperature=Mild) GINI(Humidity = High)
=1-(3/4)^2-(1/4)^2= 1-0.5625- = 1 -(1/3)^2-(2/3)^2= 1 -
0.0625 = 0.375 0.1111 - 0.4444 = 0.444
GINI(Temperature = Hot) GINI(Humidity = Normal)
= 1-(1/2)^2-(1/2)^2 = 0.5 = 1-(3/3)^2-(0/3)^2 = 1-1-0 = 0
Now, Gini impurity of parent GINI(Humidity) = (3/6)* 0.444
node = weighted average of + (3/6)*0 = 0.22223
Gini impurities of leaf nodes.
GINI(Temperature) =
(4/6)*0.375 + (2/6)*0.5 =
0.417
Calculations

Now, we should consider for next level nodes for better separation

Day Outlook Tempara Humidity Wind Play Day Outlook Temparat Play
ture Tennis ure Tennis
Day12 Overcast Mild High Strong Yes Day12 Overcast Mild Yes
Day14 Rain Mild High Strong No Day14 Rain Mild No
Day2 Sunny Hot High Strong No Day2 Sunny Hot No
Calculations
Temperature Outlook
Mild [Yes: 1, No: 1] Sunny [Yes: 0, No: 1]
Hot [Yes: 0, No: 1] Overcast [Yes: 1, No: 0]
GINI(Temperature=Mild)= Rain [Yes: 0, No: 1]
1-(1/2)^2-(1/2)^2= 0.5 GINI(Outlook=sunny) = 0
GINI(Temperature = Hot) = GINI(Outlook= Overcast) = 0
1-(0/1)^2-(1/1)^2 = 1-0-1=0 GINI(Outlook= Rain) = 0
Now, Now,
Gini impurity of parent node = Gini impurity of parent node =
weighted average of Gini weighted average of Gini
impurities of leaf nodes impurities of leaf nodes
GINI(Temperature) = (2/3)*0.5 GINI(Outlook) = (1/3)*0 +
+ (1/3)*0 = 0.333 (1/3)*0 + (1/3)*0 = 0
Calculations

Day Outlook Temparature Humidity Wind Play Tennis


Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day13 Overcast Hot Normal Weak Yes
Bootstrapped dataset creation-2
Day Outlook Temparature Humidity Wind Play Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day2 Sunny Hot High Strong No

2. Create decision trees using random subset of variables


or columns [ Here, we considered only 2 columns
randomly] from Bootstrapped dataset
Day Outlook Temparature Play Tennis
Day1 Sunny Hot No
Day2 Sunny Hot No
Day3 Overcast Hot Yes
Day4 Rain Mild Yes
Day5 Rain Cool Yes
Day2 Sunny Hot No
3. Calculations
Outlook
Sunny [Yes: 0, No: 3]
Overcast [Yes: 1, No: 0]
Rain [Yes: 2, No: 0]
GINI(Outlook=sunny) = 1 - (0/3)^2-(3/3)^2 = 1 - 0 - 1 = 0
GINI(Outlook= Overcast) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0
GINI(Outlook= Rain) = 1 - (2/2)^2-(0/2)^2 = 1 - 1 - 0 = 0
Now,
GINI impurity of parent node = weighted average of Gini
impurities of leaf nodes

GINI(Outlook) = (3/6)*0 + (1/6)*0 + (2/6)*0 = 0


3. Calculations (cont…)
Temperature
Hot [Yes: 1, No: 3]
Mild [Yes: 1, No: 0]
Cool [Yes: 1, No: 0]
GINI(Temperature=Hot)= 1-(1/4)^2-(3/4)^2= 1-0.0625-0.5625
= 0.375
GINI(Temperature=Mild) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0
GINI(Temperature=Cool) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0

GINI(Temperature) = (4/6)* 0.375 + (1/6)*0 + (1/6)*0 = 0.25


The lowest impurity means, the feature with lowest impurity
separates the classes well.
As GINI(Outlook) < GINI(Temperature), so Outlook will be
in the root of our decision tree.

Now, we should consider for next level nodes for better


separation.
Bootstrapped Dataset 3
Day Outlook Temparature Humidity Wind Play Tennis
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day13 Overcast Hot Normal Weak Yes

Create decision trees using random subset of variables


or columns [ Here, we considered only 2 columns
randomly]
Day Humidity Wind Play Tennis
Day6 Normal Strong No
Day7 Normal Strong Yes
Day8 High Weak No
Day9 Normal Weak Yes
Day10 Normal Weak Yes
Day13 Normal Weak Yes
NOW, A Query:
Day Outlook Temparature Humidity Wind Play Tennis
Day13 Overcast Hot Normal Weak Yes

Bagging = Yes: 1 Bagging = Yes: 2

If Tree 3 result is NO.


Then Bagging: Yes: 2, No: 1
So, Final result of the query is YES
Calculations
Humidity Wind
High [Yes: 1, No: 2] Strong [Yes: 0, No: 2]
Normal [Yes: 3, No: 0] Weak [Yes: 0, No: 1]
GINI(Humidity = High) = 1 - GINI(Wind = Strong)=1 -
(1/3)^2-(2/3)^2= 1 - 0.1111 - (0/2)^2-(2/2)^2= 1 - 0 - 1 = 0
0.4444 = 0.444 GINI(Wind = Weak) = 1 -
GINI(Humidity = Normal) = 1 - (0/1)^2-(1/1)^2 = 1 - 0 - 1 = 0
(3/3)^2-(0/3)^2 = 1 - 1 - 0 = 0
GINI(Humidity) = (3/6)* 0.444 GINI(Wind) = (2/3)* 0 +
+ (3/6)*0 = 0.22223 (1/3)*0 = 0

As GINI(Wind) = GINI(Humidity), so Wind or Humidity will be


the level 2 factor of our decision tree.
Regression Tree
Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and
one or more independent variables

Dr. Md. Golam Rabiul Alam


Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
References
 1. Boosting : foundations and algorithms by Robert E. Schapire and Yoav Freund.
 MIT press
 https://mitpress.mit.edu/sites/default/files/titles/content/boosting_foundations_algorithms/toc.html#indx-1

 2. Boosting and AdaBoost for Machine Learning


 https://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/

 3. Quick Introduction to Boosting Algorithms in Machine Learning


 https://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/

 4. My github repo and kaggle kernel link for GBM from scratch:
https://www.kaggle.com/grroverpr/gradient-boosting-simplified/
https://nbviewer.jupyter.org/github/groverpr/Machine-
Learning/blob/master/notebooks/01_Gradient_Boosting_Scratch.ipynb
 5. A detailed and intuitive explanation of gradient boosting: How to explain gradient boosting by Terence Parr
and Jeremy Howard
 6. Fast.ai github repo link for DecisionTree from scratch (Massive ML/DL related resources):
https://github.com/fastai/fastai
 7. Lecture by Alexander Ihler in UCA
https://www.youtube.com/watch?v=sRktKszFmSk&t=311s
8. Machine Learning With Boosting: A Beginner's Guide by Scott Hartshorn, Kindle Edition
9. Machine Learning With Random Forests And Decision Trees: A Visual Guide For Beginners by Scott Hartshorn,
Kindle Edition
10. MIT open courseware (Lecture 17: Boosting)
https://www.youtube.com/watch?v=UHBmv7qCey4
11. Hands-On Machine Learning with R by Bradley Boehmke & Brandon Greenwell
https://bradleyboehmke.github.io/HOML/
AdaBoost Classifier

Md. Golam Rabiul Alam


Boosting in ML
Boosting is an ensemble modeling technique
which attempts to build a strong classifier from
the number of weak classifiers.

The term ‘Boosting’ refers to a family of


algorithms which converts weak learner to
strong learners.

Types of Boosting Algorithms


AdaBoost (Adaptive Boosting)

Gradient Tree Boosting

XGBoost
Boosting Example
How would you classify an email as SPAM or not?

1) Email has promotional image file, It’s a SPAM


2) Email has link(s), It’s a SPAM
3) Email body consist of sentence like “You won a prize money of $ ….”,
It’s a SPAM
4) Email from our official domain “bracu.com” , Not a SPAM
5) Email from known source, Not a SPAM

Do you think these rules individually are


strong enough to successfully classify an
email?
To convert weak learner to strong learner, we can combine the
prediction of each weak learner using methods like:
• Using average/ weighted average
• Considering prediction has higher vote
AdaBoost (Adaptive Boosting)
AdaBoost was the first really
successful boosting algorithm
developed for the purpose of
binary classification.

AdaBoost is short for Adaptive


Boosting and is a very popular
boosting technique which
combines multiple “weak
classifiers” into a single “strong
classifier”.

It was formulated by Yoav Freund


and Robert Schapire.
AdaBoosting Procedure
 1. Initialize the dataset and assign equal weight
(attention) to each of the data point.
 2. Provide this as input to the model and identify the
wrongly classified data points.
 3. Increase the weight (attention) of the wrongly
classified data points.
 4. if (got required results)
Goto step 5
else
Goto step 2
 5. End
AdaBoost with DT and RF
In AdaBoost, we have to construct a forest
of trees (Stumps) with just a node and two
leaves.

Here, stumps are the week learners.


Unlike RF, the weighted voting approach
of stumps are used in bagging process.
The order of stumps construction is also a
matter in AdaBoost.
The AdaBoost Algorithm
(Freund and Schapire, 1996)
Given data: D = {(x1 , y1 ),..., (x N , yN )}

1. Initialize weightsw = 1/ N , i = 1,..., N i

2. For m = 1: M
a) Fit classifier Gm (x)? { 1,1} to data using wi
weights N

å w I ( y ¹ G (x )) i i m i
b) Compute err = m
i= 1
N

åw i= 1
i

a m = log ((1- errm )/ errm )


c) Compute
w Ч w exp 殞 i=
薏 (
i a I y
油 i m i Gm (xi )) , 1,..., N
Intro AI
d) Set Ensembles 7
AdaBoost Example
Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
AdaBoost
GINI Impurity:

The GINI Impurity of a node is the probability that a randomly chosen


sample in a node would be incorrectly labeled if it was labeled by the
distribution of samples in the node.

The GINI impurity can be computed by summing the probability pi of an


item with label i being chosen times the probability ∑ p = 1 − p of a mistake
k i
k ≠i
in categorizing that item.

It reaches its minimum (zero) when all cases in the node fall into a single
target category.
AdaBoost

[Outlook=Sunny(5): Yes-2 and No-3]


[Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
GINI(Outlook=sunny) = 1 – (2/5)^2-(3/5)^2 = 1 – 0.16 – 0.36 = 0.48
GINI(Outlook= Overcast) = 1 - (4/4)^2-(0/4)^2 = 1 - 1 - 0 = 0
GINI(Outlook= Rain) = 1 - (3/5)^2-(2/5)^2 = 1 – 0.36 – 0.16 = 0.48
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes

GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.343


AdaBoost

[Temperature=Hot(4): Yes-2 and No-2]


[Temperature=Mild(6): Yes-4 and No-2]
[Temperature=Cool(4): Yes-3 and No-1]
GINI(Temp=Hot) = 1 – (2/4)^2-(2/4)^2 = 0.5
GINI(Temp= Mild) = 1 - (4/6)^2-(2/6)^2 = 0.445
GINI(Temp= Cool) = 1 - (3/4)^2-(1/4)^2 = 0.375
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes

GINI(Temperature) = (5/14)*0.5 + (4/14)*0.44 + (5/14)*0.375 = 0.441


AdaBoost

[Humidity=High(7): Yes-3 and No-4]


[Humidity=Normal(7): Yes-6 and No-1]
GINI(Hum=High) = 1 – (3/7)^2-(4/7)^2 = 0.49
GINI(Hum= Normal) = 1 - (6/7)^2-(1/7)^2 = 0.25
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes

GINI(Humidity) = (7/14)*0.49 + (7/14)*0.25 = 0.367


AdaBoost

[Wind=Strong(6): Yes-3 and No-3]


[Wind=Weak(8): Yes-6 and No-2]
GINI(Wind=Strong) = 1 – (3/6)^2-(3/6)^2 = 0.5
GINI(Wind= Weak) = 1 - (6/8)^2-(2/8)^2 = 0.375
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes

GINI(Wind) = (6/14)*0.5 + (8/14)*0.375 = 0.429


GINI comparison
GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.343
GINI(Temperature) = (5/14)*0.5 + (4/14)*0.44 + (5/14)*0.375 = 0.441
GINI(Humidity) = (7/14)*0.49 + (7/14)*0.25 = 0.367
GINI(Wind) = (6/14)*0.5 + (8/14)*0.375 = 0.429

GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.34


is the lowest. So, outlook is the first stump.

[Outlook=Sunny(5): Yes-2 and No-3]


[Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
GINI comparison for stump selection
GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.343
GINI(Temperature) = (5/14)*0.5 + (4/14)*0.44 + (5/14)*0.375 = 0.441
GINI(Humidity) = (7/14)*0.49 + (7/14)*0.25 = 0.367
GINI(Wind) = (6/14)*0.5 + (8/14)*0.375 = 0.429

GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.34


is the lowest. So, outlook is the first stump.

[Outlook=Sunny(5): Yes-2 and No-3]


[Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
Amount of say determination
Amount of say or α = ½ ln ((1-Total error)/Total error)
Here, amount of say is the measurement of how well it
classified the samples in final classification.
Total error is the sum of the weights associated with
the incorrectly classified samples.
[Outlook=Sunny(5): Yes-2 and No-3]
Total Error [Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]

Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14

Total Error= 2* (1/14) + 0+ 2* (1/14)=2/7=0.29


Amount of say or α = ½ ln ((1-Total error)/Total error)= 0.45
[Outlook=Sunny(5): Yes-2 and No-3]
Total Error [Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]

Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14

Total Error= 2* (1/14) + 0+ 2* (1/14)=2/7=0.29


Amount of say or α = ½ ln ((1-Total error)/Total error)= 0.45
New sample Weight
New Sample Weight for incorrectly classified sample
= sample weight * e alpha
= 1/14 * e 0.45
= 0.11

New Sample Weight for correctly classified sample


= sample weight * e –alpha
= 1/14* e –alpha
= 0.046
Updated sample weight
Normalized Cumulative
Sample Normalized
Sample Weight
Weight Sample Weight

0.046 0.051 0.051


0.046 0.051 0.102
0.046 0.051 0.153
0.046 0.051 0.204
0.046 0.051 0.255
0.11 0.122 0.377
0.046 0.051 0.428
0.046 0.051 0.479
0.11 0.122 0.601
0.046 0.051 0.652
0.11 0.122 0.774
0.046 0.051 0.825
0.046 0.051 0.876
0.11 0.122 0.998

“Normalized sample weight” = “Sample Weight” / “Summation of all of the sample weights”
NSWi=SWi / ΣSWi
Updated sample weight
Cumulative Generated
Normalized Random
Sample Weight Number
0.051 0.040
0.102 0.100
0.153 0.151
0.204 0.200
0.255 0.250
0.377 0.267
0.428 0.500
0.479 0.700
0.601 0.990
0.652 0.370
0.774 0.600
0.825 0.682
0.876 0.886
0.998 0.980
New Dataset Creation with Random
Sampling Generated
Cumulative Normalized Random
Sample Weight Number
0.051 0.040
0.102 0.100
0.153 0.151
0.204 0.200
0.255 0.250
0.377 0.267
0.428 0.500
0.479 0.700
0.601 0.990
0.652 0.370
0.774 0.600
0.825 0.682
0.876 0.886
0.998 0.980
New Dataset Creation with Sample
weight
Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
Bagging

Total Amount of say for YES


stumps= 0.45+0.23=0.68
Total Amount of say for NO
stumps= 0.56

Temp

YES will be the


classification
Conclusion

We can use AdaBoost algorithms for both


classification and regression problem.
References
 1. Boosting : foundations and algorithms by Robert E. Schapire and Yoav Freund.
 MIT press
 https://mitpress.mit.edu/sites/default/files/titles/content/boosting_foundations_algorithms/toc.html#indx-1

 2. Boosting and AdaBoost for Machine Learning


 https://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/

 3. Quick Introduction to Boosting Algorithms in Machine Learning


 https://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/

 4. An Introduction to Statistical Learning: with Applications in R by Gareth James et al.


 http://www.amazon.com/dp/1461471370?tag=inspiredalgor-20

 5. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie et al.
 http://www.amazon.com/dp/0387848576?tag=inspiredalgor-20
 6. Applied Predictive Modeling by Max Kuhn
 https://www.amazon.com/dp/1461468485?tag=inspiredalgor-20

 7. AdaBoost.SAMME (Stagewise Additive Modeling using a Multi-class Exponential loss function)


 https://web.stanford.edu/~hastie/Papers/SII-2-3-A8-Zhu.pdf
Linear regression is used to predict the continuous dependent variable using a
given set of independent variables. The output for Linear Regression must be a
continuous value, such as price, age, etc.

Logistic Regression is used to predict the categorical dependent variable using a


given set of independent variables.

Gradient Descent
Dr. Md. Golam Rabiul Alam
Derivative
Derivative
Derivative
Derivative
Gradient Descent
Two or more derivatives of the same function are called Gradients.

An algorithm which uses gradient to descent to the lowest point of a


loss function.
Gradient Descent
Least Square Method
Least Square Method
Least Square Method
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Steps in Gradient Descent
Gradient Descent
Gradient Descent
Stochastic Gradient Descent (SGD)

You might also like