0% found this document useful (0 votes)
321 views3 pages

Random Forest (RF) : Decision Trees

Random Forest (RF) is a machine learning algorithm that uses ensemble learning to combine multiple decision trees. It can be used for both classification and regression tasks. RF works by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees. RF decreases variance and helps prevent overfitting compared to a single decision tree. Some key hyperparameters include the number of trees, maximum depth, and number of features considered at each split. RF provides relatively fast and powerful predictions but acts as a black-box model.

Uploaded by

Divya Negi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
321 views3 pages

Random Forest (RF) : Decision Trees

Random Forest (RF) is a machine learning algorithm that uses ensemble learning to combine multiple decision trees. It can be used for both classification and regression tasks. RF works by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees. RF decreases variance and helps prevent overfitting compared to a single decision tree. Some key hyperparameters include the number of trees, maximum depth, and number of features considered at each split. RF provides relatively fast and powerful predictions but acts as a black-box model.

Uploaded by

Divya Negi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Random Forest (RF)

Random Forest (RF) is one of the many machine learning algorithms used for supervised learning,
this means for learning from labelled data and making predictions based on the learned patterns. RF
can be used for both classification and regression tasks.

Decision trees
 RF is based on decision trees. In machine learning decision trees are a technique for creating
predictive models. They are called decision trees because the prediction follows several
branches of “if… then…” decision splits - similar to the branches of a tree.
 If we imagine that we start with a sample, which we want to predict a class for, we would
start at the bottom of a tree and travel up the trunk until we come to the first split-off
branch. This split can be thought of as a feature in machine learning, let’s say it would be
“age”; we would now make a decision about which branch to follow: “if our sample has an
age bigger than 30, continue along the left branch, else continue along the right branch”.
This we would do until we come to the next branch and repeat the same decision process
until there are no more branches before us. This endpoint is called a leaf and in decision
trees would represent the final result: a predicted class or value.
 At each branch, the feature thresholds that best split the (remaining) samples locally is
found.
 Single decision trees are very easy to visualize and understand because they follow a
method of decision-making that is very similar to how we humans make decisions: with a
chain of simple rules. However, they are not very robust, i.e. they don’t generalize well to
unseen samples. Here is where Random Forests come into play.

Ensemble learning
RF makes predictions by combining the results from many individual decision trees - so we call them
a forest of decision trees. Because RF combines multiple models, it falls under the category of
ensemble learning. Other ensemble learning methods are gradient boosting and stacked ensembles.

Combining decision trees


There are two main ways for combining the outputs of multiple decision trees into a random forest:

1. Bagging, which is also called Bootstrap aggregation (used in Random Forests)


 Bagging is the default method used with Random Forests.
 Decision trees are trained on randomly sampled subsets of the data, while sampling is
being done with replacement.
 A big advantage of bagging over individual trees is that it decreases the variance of the
model. Individual trees are very prone to overfitting and are very sensitive to noise in
the data. As long as our individual trees are not correlated, combining them with
bagging will make them more robust without increasing the bias.
 We remove (most of) the correlation by randomly sampling subsets of data and training
the different decision trees on these subsets instead of on the entire dataset.
 In addition to randomly sampling instances from our data, RF also uses feature bagging.
2. Boosting (used in Gradient Boosting Machines)
 The samples are weighted for sampling so that samples, which were predicted incorrectly
get a higher weight and are therefore sampled more often.
 The idea behind this is that difficult cases should be emphasized during learning compared
to easy cases.
 Because of this difference bagging can be easily paralleled, while boosting is performed
sequentially.

Final Result
The final result of our model is calculated by averaging over all predictions from these sampled trees
or by majority vote.
Hyperparameters
 Hyperparameters are the arguments that can be set before training and which define how
the training is done.
 The main hyperparameters in Random Forests are:
o The number of decision trees to be combined
o The maximum depth of the trees
o The maximum number of features considered at each split
o Whether bagging/bootstrapping is performed with or without replacement

Pros and Cons of Random Forests:


Pros
 They are a relatively fast and powerful algorithm for classification and regression learning.
 Calculations can be parallelized and perform well on many problems, even with small
datasets and the output returns prediction probabilities.

Cons
 They are black-boxes, meaning that we can’t interpret the decisions made by the model
because they are too complex.
 RF are also somewhat prone to overfitting and they tend to be bad at predicting
underrepresented classes in unbalanced datasets.

Boosting
 The idea of boosting came out of the idea of whether a weak learner can be modified to
become better.
 A weak hypothesis or weak learner is defined as one whose performance is at least slightly
better than random chance.

You might also like