CSE711
Symbolic Machine Learning I
Dr. Md. Golam Rabiul Alam
BRACU University
1
Lecture Outline
• Machine Learning
• Machine Learning Applications
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Symbolic Machine Learning
• Conclusion
2
Machine Learning
Machine learning is an application
of artificial intelligence (AI) that provides
systems the ability to automatically learn and
improve from experience without being
explicitly programmed.
3
What is Machine Learning?
4
What is Machine Learning?
Study of algorithms that
– improve their performance
– at some task
– with experience
5
What is Machine Learning?
From Data to Understanding …
Machine Learning in Action
6
Machine Learning in Action
• Decoding thoughts from brain scans
7
Machine Learning in Action
• Stock Market Prediction
8
Machine Learning in Action
• Document Classification
9
Machine Learning in Action
10
Machine Learning in Action
• Cars navigation on their own
The self-driving SUV 1st place in the
DARPA Urban Challenge
11
Machine Learning in Action
• Helicopters can learn aerial tricks by watching
other helicopters perform the stunts first
12
Machine Learning in Action
• Many, many more…
– Speech recognition, Natural language processing
– Computer vision
– Medical outcomes analysis
– Sensor networks
– Social networks
– …
13
ML is trending!
• Wide applicability
• Very large-scale complex systems
– Internet (billions of nodes), sensor network…
• Huge multi-dimensional data sets
– 30000 genes x 10000 drugs x 100 species x …
• Software too complex to write by hand
• Demand for self-customization to user,
environment
14
ML is not isolated!
15
Machine Learning Tasks
Broad categories-
• Supervised learning: classification, regression
• Unsupervised learning: Clustering, density estimation
• Semi-supervised learning
• Online learning
• Reinforcement learning
• Many more…
16
Supervised Learning
17
Supervised Learning - Classification
18
Supervised Learning - Regression
19
Supervised Learning Problems
20
Supervised Learning Problems
21
Supervised Learning Problems
22
Supervised Learning
• Data: A set of data records (also called
examples, instances or cases) described by
– k attributes: A1, A2, … Ak.
– a class: Each example is labelled with a pre-
defined class.
• Goal: To learn a classification/regression
model from the data that can be used to
predict the classes/values of new (future, or
test) cases/instances.
23
Unsupervised Learning
• Learning without a teacher
24
Unsupervised Learning - Density Estimation
25
Unsupervised Learning - Clustering
• Group similar things e.g. images
26
Unsupervised Learning - Clustering
• The data set has three natural groups of data
points, i.e., 3 natural clusters.
27
Symbolic ML
The ML methods, which represents
learned knowledge in a declarative
symbolic form is called Symbolic ML.
Example: Decision trees, Random forests,
K-NN, and RL
The ML methods, which represents
learned knowledge more numerically
oriented statistical or neural network form is
called Non-symbolic ML. Example: NB,
SVM, NN.
28
Approaches of Symbolic ML
Top-down induction based learning: Decision
trees
Instance based learning: Categorize new examples
based on the similarity of existing instance.
Example: K-NN
Rule induction based learning: learning through
propositional and 1st order logic.
Reinforcement learning
29
What this course is about
• Covers a wide range of ML techniques
– from basic to state-of-the-art
• You will learn about the methods you heard about
– Decision trees, Random forests, Naive Bayes, logistic
regression, k-nearest-neighbor, boosting,
dimensionality reduction, PCA, SVMs, kernels, k-
means, EM, HMMs, semi-supervised learning,
graphical models, reinforcement learning.
• Covers algorithms, theory and applications
30
Reference Books
• Machine Learning, Tom Mitchell, McGraw
Hill, 1997
• Sebastian Raschka, Vahid Mirjalili - Python
Machine Learning. Machine Learning and
Deep Learning with Python, scikit-learn and
Tensor Flow (2017, Packt)
• Stuart Russell and Peter Norvig: Artificial
Intelligence: A Modern Approach, Prentice
Hall 2010
31
Decision Making through Random
Forest
Md. Golam Rabiul Alam
Associate Professor, BRAC University
Random Forest
Random forest is a decision tree based non-
linear machine learning model for classification,
regression and feature selection.
Random Forest
The word “Random” is for random selection of data
instances, which is known as bootstrapping method in
statistics an ML as well.
The word “Forest” is for using several decision trees in
developing decision models through bagging method.
Random Forest
GINI Impurity:
The GINI Impurity of a node is the probability that a randomly chosen
sample in a node would be incorrectly labeled if it was labeled by the
distribution of samples in the node.
The GINI impurity can be computed by summing the probability pi of an
item with label i being chosen times the probability p 1 p of a mistake
k i
k i
in categorizing that item.
It reaches its minimum (zero) when all cases in the node fall into a single
target category.
Random Forest
Random Forest
Find the GINI impurity from the given date?
Random Forest
Random Forest
How to split the root node? Which splitting is better?
Random Forest
Random Forest
Steps in Random Forest Classification Method:
1. Bootstrapping for random data subset generation
2. Decision tree construction for each of the data subset
i) Determination of GINI impurity of each of the features.
Ii) Determination of GINI impurity of prospective splitting
sub-tree
Iii) Construction of Decision tree based on the splitting
GINI impurity (i.e. if sum of the GINI impurity of splitted
sub-tree is lower than the GINI impurity of parent node
then split the parent node)
3. Bagging for ensemble classification
4. Majority voting for classification decision making.
Implement Random forest on the given dataset
Day Outlook Temparature Humidity Wind Play Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Bootstrapped Dataset 1
Day Outlook Temparature Humidity Wind Play Tennis
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Day2 Sunny Hot High Strong No
Create decision trees using random subset of variables
or columns [ Here, we considered only 2 columns
randomly]
Day Temparature Humidity Play Tennis
Day10 Mild Normal Yes
Day11 Mild Normal Yes
Day12 Mild High Yes
Day13 Hot Normal Yes
Day14 Mild High No
Day2 Hot High No
Calculations
Temperature Humidity
Mild [Yes: 3, No: 1] High [Yes: 1, No: 2]
Hot [Yes: 1, No: 1] Normal [Yes: 3, No: 0]
GINI(Temperature=Mild) GINI(Humidity = High)
=1-(3/4)^2-(1/4)^2= 1-0.5625- = 1 -(1/3)^2-(2/3)^2= 1 -
0.0625 = 0.375 0.1111 - 0.4444 = 0.444
GINI(Temperature = Hot) GINI(Humidity = Normal)
= 1-(1/2)^2-(1/2)^2 = 0.5 = 1-(3/3)^2-(0/3)^2 = 1-1-0 = 0
Now, Gini impurity of parent GINI(Humidity) = (3/6)* 0.444
node = weighted average of + (3/6)*0 = 0.22223
Gini impurities of leaf nodes.
GINI(Temperature) =
(4/6)*0.375 + (2/6)*0.5 =
0.417
Calculations
Now, we should consider for next level nodes for better separation
Day Outlook Tempara Humidity Wind Play Day Outlook Temparat Play
ture Tennis ure Tennis
Day12 Overcast Mild High Strong Yes Day12 Overcast Mild Yes
Day14 Rain Mild High Strong No Day14 Rain Mild No
Day2 Sunny Hot High Strong No Day2 Sunny Hot No
Calculations
Temperature Outlook
Mild [Yes: 1, No: 1] Sunny [Yes: 0, No: 1]
Hot [Yes: 0, No: 1] Overcast [Yes: 1, No: 0]
GINI(Temperature=Mild)= Rain [Yes: 0, No: 1]
1-(1/2)^2-(1/2)^2= 0.5 GINI(Outlook=sunny) = 0
GINI(Temperature = Hot) = GINI(Outlook= Overcast) = 0
1-(0/1)^2-(1/1)^2 = 1-0-1=0 GINI(Outlook= Rain) = 0
Now, Now,
Gini impurity of parent node = Gini impurity of parent node =
weighted average of Gini weighted average of Gini
impurities of leaf nodes impurities of leaf nodes
GINI(Temperature) = (2/3)*0.5 GINI(Outlook) = (1/3)*0 +
+ (1/3)*0 = 0.333 (1/3)*0 + (1/3)*0 = 0
Calculations
Day Outlook Temparature Humidity Wind Play Tennis
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day13 Overcast Hot Normal Weak Yes
Bootstrapped dataset creation-2
Day Outlook Temparature Humidity Wind Play Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day2 Sunny Hot High Strong No
2. Create decision trees using random subset of variables
or columns [ Here, we considered only 2 columns
randomly] from Bootstrapped dataset
Day Outlook Temparature Play Tennis
Day1 Sunny Hot No
Day2 Sunny Hot No
Day3 Overcast Hot Yes
Day4 Rain Mild Yes
Day5 Rain Cool Yes
Day2 Sunny Hot No
3. Calculations
Outlook
Sunny [Yes: 0, No: 3]
Overcast [Yes: 1, No: 0]
Rain [Yes: 2, No: 0]
GINI(Outlook=sunny) = 1 - (0/3)^2-(3/3)^2 = 1 - 0 - 1 = 0
GINI(Outlook= Overcast) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0
GINI(Outlook= Rain) = 1 - (2/2)^2-(0/2)^2 = 1 - 1 - 0 = 0
Now,
GINI impurity of parent node = weighted average of Gini
impurities of leaf nodes
GINI(Outlook) = (3/6)*0 + (1/6)*0 + (2/6)*0 = 0
3. Calculations (cont…)
Temperature
Hot [Yes: 1, No: 3]
Mild [Yes: 1, No: 0]
Cool [Yes: 1, No: 0]
GINI(Temperature=Hot)= 1-(1/4)^2-(3/4)^2= 1-0.0625-0.5625
= 0.375
GINI(Temperature=Mild) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0
GINI(Temperature=Cool) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0
GINI(Temperature) = (4/6)* 0.375 + (1/6)*0 + (1/6)*0 = 0.25
The lowest impurity means, the feature with lowest impurity
separates the classes well.
As GINI(Outlook) < GINI(Temperature), so Outlook will be
in the root of our decision tree.
Now, we should consider for next level nodes for better
separation.
Bootstrapped Dataset 3
Day Outlook Temparature Humidity Wind Play Tennis
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day13 Overcast Hot Normal Weak Yes
Create decision trees using random subset of variables
or columns [ Here, we considered only 2 columns
randomly]
Day Humidity Wind Play Tennis
Day6 Normal Strong No
Day7 Normal Strong Yes
Day8 High Weak No
Day9 Normal Weak Yes
Day10 Normal Weak Yes
Day13 Normal Weak Yes
NOW, A Query:
Day Outlook Temparature Humidity Wind Play Tennis
Day13 Overcast Hot Normal Weak Yes
Bagging = Yes: 1 Bagging = Yes: 2
If Tree 3 result is NO.
Then Bagging: Yes: 2, No: 1
So, Final result of the query is YES
Calculations
Humidity Wind
High [Yes: 1, No: 2] Strong [Yes: 0, No: 2]
Normal [Yes: 3, No: 0] Weak [Yes: 0, No: 1]
GINI(Humidity = High) = 1 - GINI(Wind = Strong)=1 -
(1/3)^2-(2/3)^2= 1 - 0.1111 - (0/2)^2-(2/2)^2= 1 - 0 - 1 = 0
0.4444 = 0.444 GINI(Wind = Weak) = 1 -
GINI(Humidity = Normal) = 1 - (0/1)^2-(1/1)^2 = 1 - 0 - 1 = 0
(3/3)^2-(0/3)^2 = 1 - 1 - 0 = 0
GINI(Humidity) = (3/6)* 0.444 GINI(Wind) = (2/3)* 0 +
+ (3/6)*0 = 0.22223 (1/3)*0 = 0
As GINI(Wind) = GINI(Humidity), so Wind or Humidity will be
the level 2 factor of our decision tree.
Regression Tree
Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and
one or more independent variables
Dr. Md. Golam Rabiul Alam
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
Regression Tree Example
References
1. Boosting : foundations and algorithms by Robert E. Schapire and Yoav Freund.
MIT press
https://mitpress.mit.edu/sites/default/files/titles/content/boosting_foundations_algorithms/toc.html#indx-1
2. Boosting and AdaBoost for Machine Learning
https://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/
3. Quick Introduction to Boosting Algorithms in Machine Learning
https://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/
4. My github repo and kaggle kernel link for GBM from scratch:
https://www.kaggle.com/grroverpr/gradient-boosting-simplified/
https://nbviewer.jupyter.org/github/groverpr/Machine-
Learning/blob/master/notebooks/01_Gradient_Boosting_Scratch.ipynb
5. A detailed and intuitive explanation of gradient boosting: How to explain gradient boosting by Terence Parr
and Jeremy Howard
6. Fast.ai github repo link for DecisionTree from scratch (Massive ML/DL related resources):
https://github.com/fastai/fastai
7. Lecture by Alexander Ihler in UCA
https://www.youtube.com/watch?v=sRktKszFmSk&t=311s
8. Machine Learning With Boosting: A Beginner's Guide by Scott Hartshorn, Kindle Edition
9. Machine Learning With Random Forests And Decision Trees: A Visual Guide For Beginners by Scott Hartshorn,
Kindle Edition
10. MIT open courseware (Lecture 17: Boosting)
https://www.youtube.com/watch?v=UHBmv7qCey4
11. Hands-On Machine Learning with R by Bradley Boehmke & Brandon Greenwell
https://bradleyboehmke.github.io/HOML/
AdaBoost Classifier
Md. Golam Rabiul Alam
Boosting in ML
Boosting is an ensemble modeling technique
which attempts to build a strong classifier from
the number of weak classifiers.
The term ‘Boosting’ refers to a family of
algorithms which converts weak learner to
strong learners.
Types of Boosting Algorithms
AdaBoost (Adaptive Boosting)
Gradient Tree Boosting
XGBoost
Boosting Example
How would you classify an email as SPAM or not?
1) Email has promotional image file, It’s a SPAM
2) Email has link(s), It’s a SPAM
3) Email body consist of sentence like “You won a prize money of $ ….”,
It’s a SPAM
4) Email from our official domain “bracu.com” , Not a SPAM
5) Email from known source, Not a SPAM
Do you think these rules individually are
strong enough to successfully classify an
email?
To convert weak learner to strong learner, we can combine the
prediction of each weak learner using methods like:
• Using average/ weighted average
• Considering prediction has higher vote
AdaBoost (Adaptive Boosting)
AdaBoost was the first really
successful boosting algorithm
developed for the purpose of
binary classification.
AdaBoost is short for Adaptive
Boosting and is a very popular
boosting technique which
combines multiple “weak
classifiers” into a single “strong
classifier”.
It was formulated by Yoav Freund
and Robert Schapire.
AdaBoosting Procedure
1. Initialize the dataset and assign equal weight
(attention) to each of the data point.
2. Provide this as input to the model and identify the
wrongly classified data points.
3. Increase the weight (attention) of the wrongly
classified data points.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End
AdaBoost with DT and RF
In AdaBoost, we have to construct a forest
of trees (Stumps) with just a node and two
leaves.
Here, stumps are the week learners.
Unlike RF, the weighted voting approach
of stumps are used in bagging process.
The order of stumps construction is also a
matter in AdaBoost.
The AdaBoost Algorithm
(Freund and Schapire, 1996)
Given data: D = {(x1 , y1 ),..., (x N , yN )}
1. Initialize weightsw = 1/ N , i = 1,..., N i
2. For m = 1: M
a) Fit classifier Gm (x)? { 1,1} to data using wi
weights N
å w I ( y ¹ G (x )) i i m i
b) Compute err = m
i= 1
N
åw i= 1
i
a m = log ((1- errm )/ errm )
c) Compute
w Ч w exp 殞 i=
薏 (
i a I y
油 i m i Gm (xi )) , 1,..., N
Intro AI
d) Set Ensembles 7
AdaBoost Example
Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
AdaBoost
GINI Impurity:
The GINI Impurity of a node is the probability that a randomly chosen
sample in a node would be incorrectly labeled if it was labeled by the
distribution of samples in the node.
The GINI impurity can be computed by summing the probability pi of an
item with label i being chosen times the probability ∑ p = 1 − p of a mistake
k i
k ≠i
in categorizing that item.
It reaches its minimum (zero) when all cases in the node fall into a single
target category.
AdaBoost
[Outlook=Sunny(5): Yes-2 and No-3]
[Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
GINI(Outlook=sunny) = 1 – (2/5)^2-(3/5)^2 = 1 – 0.16 – 0.36 = 0.48
GINI(Outlook= Overcast) = 1 - (4/4)^2-(0/4)^2 = 1 - 1 - 0 = 0
GINI(Outlook= Rain) = 1 - (3/5)^2-(2/5)^2 = 1 – 0.36 – 0.16 = 0.48
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes
GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.343
AdaBoost
[Temperature=Hot(4): Yes-2 and No-2]
[Temperature=Mild(6): Yes-4 and No-2]
[Temperature=Cool(4): Yes-3 and No-1]
GINI(Temp=Hot) = 1 – (2/4)^2-(2/4)^2 = 0.5
GINI(Temp= Mild) = 1 - (4/6)^2-(2/6)^2 = 0.445
GINI(Temp= Cool) = 1 - (3/4)^2-(1/4)^2 = 0.375
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes
GINI(Temperature) = (5/14)*0.5 + (4/14)*0.44 + (5/14)*0.375 = 0.441
AdaBoost
[Humidity=High(7): Yes-3 and No-4]
[Humidity=Normal(7): Yes-6 and No-1]
GINI(Hum=High) = 1 – (3/7)^2-(4/7)^2 = 0.49
GINI(Hum= Normal) = 1 - (6/7)^2-(1/7)^2 = 0.25
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes
GINI(Humidity) = (7/14)*0.49 + (7/14)*0.25 = 0.367
AdaBoost
[Wind=Strong(6): Yes-3 and No-3]
[Wind=Weak(8): Yes-6 and No-2]
GINI(Wind=Strong) = 1 – (3/6)^2-(3/6)^2 = 0.5
GINI(Wind= Weak) = 1 - (6/8)^2-(2/8)^2 = 0.375
Now,
GINI impurity of parent node = weighted average of Gini impurities of leaf nodes
GINI(Wind) = (6/14)*0.5 + (8/14)*0.375 = 0.429
GINI comparison
GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.343
GINI(Temperature) = (5/14)*0.5 + (4/14)*0.44 + (5/14)*0.375 = 0.441
GINI(Humidity) = (7/14)*0.49 + (7/14)*0.25 = 0.367
GINI(Wind) = (6/14)*0.5 + (8/14)*0.375 = 0.429
GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.34
is the lowest. So, outlook is the first stump.
[Outlook=Sunny(5): Yes-2 and No-3]
[Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
GINI comparison for stump selection
GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.343
GINI(Temperature) = (5/14)*0.5 + (4/14)*0.44 + (5/14)*0.375 = 0.441
GINI(Humidity) = (7/14)*0.49 + (7/14)*0.25 = 0.367
GINI(Wind) = (6/14)*0.5 + (8/14)*0.375 = 0.429
GINI(Outlook) = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.34
is the lowest. So, outlook is the first stump.
[Outlook=Sunny(5): Yes-2 and No-3]
[Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
Amount of say determination
Amount of say or α = ½ ln ((1-Total error)/Total error)
Here, amount of say is the measurement of how well it
classified the samples in final classification.
Total error is the sum of the weights associated with
the incorrectly classified samples.
[Outlook=Sunny(5): Yes-2 and No-3]
Total Error [Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
Total Error= 2* (1/14) + 0+ 2* (1/14)=2/7=0.29
Amount of say or α = ½ ln ((1-Total error)/Total error)= 0.45
[Outlook=Sunny(5): Yes-2 and No-3]
Total Error [Outlook=Overcast(4): Yes-4 and No-0]
[Outlook=Rain(5): Yes-3 and No-2]
Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
Total Error= 2* (1/14) + 0+ 2* (1/14)=2/7=0.29
Amount of say or α = ½ ln ((1-Total error)/Total error)= 0.45
New sample Weight
New Sample Weight for incorrectly classified sample
= sample weight * e alpha
= 1/14 * e 0.45
= 0.11
New Sample Weight for correctly classified sample
= sample weight * e –alpha
= 1/14* e –alpha
= 0.046
Updated sample weight
Normalized Cumulative
Sample Normalized
Sample Weight
Weight Sample Weight
0.046 0.051 0.051
0.046 0.051 0.102
0.046 0.051 0.153
0.046 0.051 0.204
0.046 0.051 0.255
0.11 0.122 0.377
0.046 0.051 0.428
0.046 0.051 0.479
0.11 0.122 0.601
0.046 0.051 0.652
0.11 0.122 0.774
0.046 0.051 0.825
0.046 0.051 0.876
0.11 0.122 0.998
“Normalized sample weight” = “Sample Weight” / “Summation of all of the sample weights”
NSWi=SWi / ΣSWi
Updated sample weight
Cumulative Generated
Normalized Random
Sample Weight Number
0.051 0.040
0.102 0.100
0.153 0.151
0.204 0.200
0.255 0.250
0.377 0.267
0.428 0.500
0.479 0.700
0.601 0.990
0.652 0.370
0.774 0.600
0.825 0.682
0.876 0.886
0.998 0.980
New Dataset Creation with Random
Sampling Generated
Cumulative Normalized Random
Sample Weight Number
0.051 0.040
0.102 0.100
0.153 0.151
0.204 0.200
0.255 0.250
0.377 0.267
0.428 0.500
0.479 0.700
0.601 0.990
0.652 0.370
0.774 0.600
0.825 0.682
0.876 0.886
0.998 0.980
New Dataset Creation with Sample
weight
Sample
Weight
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
1/14
Bagging
Total Amount of say for YES
stumps= 0.45+0.23=0.68
Total Amount of say for NO
stumps= 0.56
Temp
YES will be the
classification
Conclusion
We can use AdaBoost algorithms for both
classification and regression problem.
References
1. Boosting : foundations and algorithms by Robert E. Schapire and Yoav Freund.
MIT press
https://mitpress.mit.edu/sites/default/files/titles/content/boosting_foundations_algorithms/toc.html#indx-1
2. Boosting and AdaBoost for Machine Learning
https://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/
3. Quick Introduction to Boosting Algorithms in Machine Learning
https://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/
4. An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
http://www.amazon.com/dp/1461471370?tag=inspiredalgor-20
5. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie et al.
http://www.amazon.com/dp/0387848576?tag=inspiredalgor-20
6. Applied Predictive Modeling by Max Kuhn
https://www.amazon.com/dp/1461468485?tag=inspiredalgor-20
7. AdaBoost.SAMME (Stagewise Additive Modeling using a Multi-class Exponential loss function)
https://web.stanford.edu/~hastie/Papers/SII-2-3-A8-Zhu.pdf
Linear regression is used to predict the continuous dependent variable using a
given set of independent variables. The output for Linear Regression must be a
continuous value, such as price, age, etc.
Logistic Regression is used to predict the categorical dependent variable using a
given set of independent variables.
Gradient Descent
Dr. Md. Golam Rabiul Alam
Derivative
Derivative
Derivative
Derivative
Gradient Descent
Two or more derivatives of the same function are called Gradients.
An algorithm which uses gradient to descent to the lowest point of a
loss function.
Gradient Descent
Least Square Method
Least Square Method
Least Square Method
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Steps in Gradient Descent
Gradient Descent
Gradient Descent
Stochastic Gradient Descent (SGD)