0% found this document useful (0 votes)

86 views14 pages

Random Forest PDF

The document discusses random forest, an ensemble machine learning method. It provides an overview of random forest, describing how it works by training decision trees on randomly selected subsets of data and variables. It also discusses how random forest reduces variance compared to a single decision tree and how it calculates out-of-bag error and variable importance.

Uploaded by

krishnanaarti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views14 pages

Random Forest PDF

Uploaded by

krishnanaarti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Random Forest

Applied Multivariate Statistics Spring 2012

Overview
Intuition of Random Forest
The Random Forest Algorithm
De-correlation gives better accuracy

Out-of-bag error (OOB-error)

Variable importance

Healthy
Diseased

Diseased

Intuition of Random Forest

Tree 2

Tree 1

young

old
young

old

diseased

healthy
diseased

healthy
male

tall

female

short

healthy
healthy

healthy

diseased
Tree 3

New sample:

retired

working

healthy

old, retired, male, short

Tree predictions:
diseased, healthy, diseased
tall

Majority rule:
diseased

healthy

short
diseased
2

The Random Forest Algorithm

Differences to standard tree

Train each tree on bootstrap resample of data
(Bootstrap resample of data set with N samples:
Make new data set by drawing with replacement N samples; i.e., some samples will
probably occur multiple times in new data set)

For each split, consider only m randomly selected variables

Dont prune

Fit B trees in such a way and use average or majority

voting to aggregate results

Why Random Forest works 1/2

Mean Squared Error = Variance + Bias2
If trees are sufficiently deep, they have very small bias
How could we improve the variance over that of a single
tree?

Why Random Forest works 2/2

i=j

Decreaes, if
decreases, i.e., if

De-correlation gives
better accuracy

m decreases

Decreases, if number of trees B

increases (irrespective of )
6

Estimating generalization error:

Out-of bag (OOB) error
Similar to leave-one-out cross-validation, but almost
without any additional computational burden
OOB error is a random number, since based on random
resamples of the data
Data:

Resampled Data:

Out of bag samples:

old, tall healthy

young, short diseased

old, short diseased

young, tall healthy

young, short diseased

young, tall healthy

young, short healthy

young, tall healthy
old, short diseased

young, short healthy

young, tall healthy

young

old

old, short diseased

diseased
tall
healthy

healthy

short
diseased

Out of bag (OOB) error rate:

= 0.25

Variable Importance for variable i

Data
using Permutations
Resampled

Resampled

Dataset 1

Dataset m

OOB

Data 1

OOB

Data m

Permute values of
variable i in OOB

Tree 1

Tree m

data set
OOB error e1

OOB error em
d1 = e1p1

dm =em-pm
OOB error pm

OOB error p1

d=
s2d

1
m

1
m1

i=1 di

i=1 (di

vi =

d
sd
8

Trees

vs.

Random Forest

+ Trees yield insight into

decision rules
+ Rather fast
+ Easy to tune
parameters

+ RF as smaller prediction
variance and therefore
usually a better general
performance
+ Easy to tune parameters

- Prediction of trees tend

to have a high variance

- Rather slow
- Black Box: Rather hard
to get insights into decision
rules

Comparing runtime
(just for illustration)
Up to thousands of variables
Problematic if there are categorical predictors with many levels (max: 32 levels)

RF: First predictor cut into 15 levels

Tree

vs.

+ Can model nonlinear

class boundaries
+ OOB error for free (no
CV needed)
+ Works on continuous and
categorical responses
(regression / classification)
+ Gives variable
importance
+ Very good performance
x

- Black box
- Slow

xx x

x
x

x
xx
x
x x
x

LDA
+ Very fast
+ Discriminants for visualizing
group separation
+ Can read off decision rule
- Can model only linear class
boundaries
- Mediocre performance
- No variable selection
- Only on categorical response
- Needs CV for estimating
prediction error
x x
x
x
x
x
x
x x
x x
11

Concepts to know
Idea of Random Forest and how it reduces the prediction
variance of trees
OOB error
Variable Importance based on Permutation

R functions to know
Function randomForest and varImpPlot from package
randomForest

Random Forest
No ratings yet
Random Forest
25 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Random Forest for Data Scientists
No ratings yet
Random Forest for Data Scientists
38 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Random Forest
No ratings yet
Random Forest
8 pages
ML Mid Question Solve
No ratings yet
ML Mid Question Solve
19 pages
Random Forests
No ratings yet
Random Forests
43 pages
Random Forest: Prediction of Genetic Susceptibility To Complex Diseases
No ratings yet
Random Forest: Prediction of Genetic Susceptibility To Complex Diseases
7 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
No ratings yet
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
53 pages
ML Lec6
No ratings yet
ML Lec6
4 pages
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
16 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Random Forest Lecture
No ratings yet
Random Forest Lecture
5 pages
Random Forest
No ratings yet
Random Forest
29 pages
Random Forest
No ratings yet
Random Forest
16 pages
CE880 Lecture7 Slides
No ratings yet
CE880 Lecture7 Slides
78 pages
A Very Basic Introduction To Random Forests Using R - Oxford Protein Informatics Group
No ratings yet
A Very Basic Introduction To Random Forests Using R - Oxford Protein Informatics Group
7 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest and Parameter Tuning in R
No ratings yet
Random Forest and Parameter Tuning in R
9 pages
Random Forest
No ratings yet
Random Forest
16 pages
Session 2 On Random Forest
No ratings yet
Session 2 On Random Forest
11 pages
MIS410 Chapter6
No ratings yet
MIS410 Chapter6
47 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Tree Based Methods
No ratings yet
Tree Based Methods
21 pages
Random Forest
No ratings yet
Random Forest
14 pages
Random Forest
No ratings yet
Random Forest
5 pages
Random Forests
No ratings yet
Random Forests
35 pages
Decision Trees, Bagging, and Boosting Guide
No ratings yet
Decision Trees, Bagging, and Boosting Guide
2 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
RandomForest2324 CR - 4p
No ratings yet
RandomForest2324 CR - 4p
11 pages
T2 Scheme 24 25
No ratings yet
T2 Scheme 24 25
8 pages
Random Forest
No ratings yet
Random Forest
5 pages
Week 12
No ratings yet
Week 12
34 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Decision Trees and Random Forest - Regression
No ratings yet
Decision Trees and Random Forest - Regression
8 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Random Forest, CNN and Different Algorithm
No ratings yet
Random Forest, CNN and Different Algorithm
14 pages
Day 4: Bootstrap & Random Forest Techniques
No ratings yet
Day 4: Bootstrap & Random Forest Techniques
37 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Da MS
No ratings yet
Da MS
24 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Case Study Possible Questions
No ratings yet
Case Study Possible Questions
3 pages
Machine Learning in Ecology
No ratings yet
Machine Learning in Ecology
15 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
R Assignment
No ratings yet
R Assignment
8 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Trees and Forests
No ratings yet
Trees and Forests
113 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
9 pages
Day 2 Presentation
No ratings yet
Day 2 Presentation
65 pages
Ml2 Summary
No ratings yet
Ml2 Summary
8 pages
Random Forest
100% (1)
Random Forest
83 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Rodolfo e Laycano Activity-5-With-Answer
No ratings yet
Rodolfo e Laycano Activity-5-With-Answer
4 pages
Review Questions: P (X) Represent?
No ratings yet
Review Questions: P (X) Represent?
1 page
A Generalization of Shapiro Wilk S Test For Multivariate Normality
No ratings yet
A Generalization of Shapiro Wilk S Test For Multivariate Normality
15 pages
Reliability
No ratings yet
Reliability
15 pages
Statistics for High Schoolers
No ratings yet
Statistics for High Schoolers
6 pages
Introductory Econometrics A Modern Approach 4th Edition Jeffrey M. Wooldridge Instant Download
No ratings yet
Introductory Econometrics A Modern Approach 4th Edition Jeffrey M. Wooldridge Instant Download
86 pages
Bayesian Statistics
No ratings yet
Bayesian Statistics
6 pages
Inferential Statistics FInal
No ratings yet
Inferential Statistics FInal
34 pages
SIMPLEtestofhypothesis
No ratings yet
SIMPLEtestofhypothesis
35 pages
Kruskal Wallis and Friedman Test: Table of Contents
No ratings yet
Kruskal Wallis and Friedman Test: Table of Contents
5 pages
Optimizing Renewable Energy Integration
No ratings yet
Optimizing Renewable Energy Integration
10 pages
Business Stats for B.Com Students
No ratings yet
Business Stats for B.Com Students
2 pages
Statistics Quiz for 2nd Year Students
No ratings yet
Statistics Quiz for 2nd Year Students
1 page
Project 1 - Descriptive Statistics
No ratings yet
Project 1 - Descriptive Statistics
11 pages
Poverty Data Interpolation Analysis
No ratings yet
Poverty Data Interpolation Analysis
7 pages
Periodograms and Blackman-Tukey Spectral Estimation: - Objectives
No ratings yet
Periodograms and Blackman-Tukey Spectral Estimation: - Objectives
11 pages
Statistics Confidence Intervals
No ratings yet
Statistics Confidence Intervals
4 pages
Chapter6 Stats
No ratings yet
Chapter6 Stats
4 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Forward Error Correction PDF
No ratings yet
Forward Error Correction PDF
7 pages
Basic Statistics For The Health Sciences 3rd Edition by Jan Kuzma 0874845874 9780874845877 - Download The Ebook Today and Own The Complete Version
No ratings yet
Basic Statistics For The Health Sciences 3rd Edition by Jan Kuzma 0874845874 9780874845877 - Download The Ebook Today and Own The Complete Version
55 pages
Analytical Method Quality Parameters
No ratings yet
Analytical Method Quality Parameters
62 pages
Continuous Probability Distribution: Bipul Kumar Sarker
No ratings yet
Continuous Probability Distribution: Bipul Kumar Sarker
32 pages
(Ebook) Introductory Applied Biostatistics by Lisa M. Sullivan Alexa S. Beiser Ralph B. D'Agostino ISBN 9780495014713, 9780534423995, 0495014710, 053442399X No Waiting Time
No ratings yet
(Ebook) Introductory Applied Biostatistics by Lisa M. Sullivan Alexa S. Beiser Ralph B. D'Agostino ISBN 9780495014713, 9780534423995, 0495014710, 053442399X No Waiting Time
158 pages
EC 117 Syllabus
No ratings yet
EC 117 Syllabus
4 pages
Markov Chains: State Classification
No ratings yet
Markov Chains: State Classification
4 pages
Predictive Modelling Project - n1
100% (4)
Predictive Modelling Project - n1
36 pages
Path Analysis and Structural Equation Modeling With Latent Variables
No ratings yet
Path Analysis and Structural Equation Modeling With Latent Variables
35 pages
MA1312 Assignment 01 (July 24)
No ratings yet
MA1312 Assignment 01 (July 24)
2 pages
Chi Patrat PDF
No ratings yet
Chi Patrat PDF
14 pages

Random Forest PDF

Uploaded by

Random Forest PDF

Uploaded by

Random Forest

Applied Multivariate Statistics Spring 2012

Out-of-bag error (OOB-error)

Intuition of Random Forest

old, retired, male, short

The Random Forest Algorithm

Differences to standard tree

For each split, consider only m randomly selected variables

Fit B trees in such a way and use average or majority

Why Random Forest works 1/2

Why Random Forest works 2/2

Decreases, if number of trees B

Estimating generalization error:

Out of bag samples:

old, tall healthy

old, tall healthy

young, short diseased

old, short diseased

old, short diseased

young, tall healthy

young, tall healthy

young, short diseased

young, tall healthy

young, short healthy

young, short healthy

young, tall healthy

old, short diseased

Out of bag (OOB) error rate:

Variable Importance for variable i

+ Trees yield insight into

- Prediction of trees tend

RF: First predictor cut into 15 levels

+ Can model nonlinear

You might also like