0 ratings0% found this document useful (0 votes) 65 views67 pagesML 3RD Unit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, 
claim it here.
Available Formats
Download as PDF or read online on Scribd
apter
 
Supervised earning
 
3.3 1 INTRODUCTION
As the name suggests, supervised machine learning is built on monitoring the way that
machines learn. This means that in the supervised learning technique, we train the machines
using the labeled or trained data set, and based on that training, the machine predicts the
output. In this context, the flagged data indicates that some of the inputs that we feed to the
algorithm are already mapped to the output.
In supervised learning, the training data that is given to the machine serves as the
supervisor, instructing machine on how to correctly predict the output. It employs the same
idea that a Pupil would learn under a teacher's guidance.
Supervised learning is a: process that provides the machine learning model with both
input data and correct output data. The aim of a supervised learning algorithm is to find a
a
© scanned with OKEN Scanneros
ea / Seber eg
mapping function to map the input variable(x) with the output variable(y), 7, t
in this type of machine learning, we try to teach the machines using the traineg gy
then expect it to predict the outcomes on the test data. My
Supervised learning allows machines to classify things, problems, or situa,
data fed into them. Machines are repeatedly fed with data such as the traits, be
measurements, color, and height of objects, people, or situations until the mah
conduct accurate classifications. q
During supervised leaming, a machine is given data referred to as taining ay
mining terminology, on which t performs classification, For example, ifa syst? 4y
to classify fruit, it would be given training data such as color, shapes, dimension and ba
Would be able to classify fruit based on this data. it,
To accomplish accurate classification, a system often requires numerous itera
such a procedure. Since real-life classifications such as credit card fraud detecign ®
disease classification are complex tasks, the machines require suitable data ang i ui
Tounds of learning sessions to attain realistic skills.
3.2. EXAMPLES OF SUPERVISED LEARNING :
The working of Supervised learning can be easily understdod by the below example
diagram:
EXAMPLE 1: Let's say we have a dataset of various forms, such as Squares, rectan,
triangles, and polygons. Now the first step is that we need to train the model for each shay
 
 
Prediction
a [).| Swe
L+(*2\_. ste
 
 
  
 
 
 
Tiangle
 
 
 
Model Training
 
 
 
 
 
Test Data
 
 
 
Fig. 9.1 Example of Supervised Learning
© scanned with OKEN Scanneryr
wer es
ane giver shape has four sides, and all the sides are equal, then it will be labeled as
 
* square:
gpthe given shape has three sides, then it will be labeled as a triangle.
. iethe giver shape has six equal sides then it will be labeled as. hexagon.
.
as after training, the model will be tested using the test set, and the task of the model
stoidentfy teeters :
machine has already trained on many shapes, so when it encounters a new shape, it
sizes it based on 8 number of its sides and predict the result.
cates
PLE 2: As shown in the above example, we have initially taken some data and
xed them as ‘Spam’ or ‘Not Spam’. The training supervised model uses this labeled
atlas model is trained using this data, Once it has been trained, we can test our model by
th ae
or afew test emails to see if it can accurately predict the desired result.
sl
  
Fig, 3.2 Example of Supervised Learning
'3,3. TYPES OF SUPERVISED MACHINE LEARNING
On the basis of the problem that we have to encounter, we can classify Supervised
Learning into two categories:
(a) Classification
We employ Classification algorithms to handle the classification problems where the
output variable is categorical, There are many other kinds of these categories, including True
© scanned with OKEN Scannerxa epee
or False, Male or Female, White or Black,
classification algorithms forecast the categories tl
Some of the widely used Classification algorithins are:
A
@ scanned with OKEN Scanner
and others. On the basis of training gar,
hat are present in the dataset, thy
K-nearest neighbor (KNN)
Decision Tree Algorithm
Naive Bayes Classifier Algorithm
Random Forest Algorithm
Support Vector Machine Algorithm
(b) Regression
Regression algorithms are used to solve regression problems where there exists a lines
relationship between input and output variables. When the output variable has an actus
value, such "dollars" or "weight," the situation is known as a regression problem, These
variables can be used to forecast continuous output variables, such as market trends,
weather forecasts, and other things.
Some of the popular Regression algorithms are:
«3.4
Simple Linear Regression Algorithm
‘Multivariate Regression Algorithm
Decision Tree Algorithm
Lasso Regression
" CLASSIFICATION MODEL es
Classification is a process of categorizing data or objects into predefined classes «
categories based on their features or attributes. Classification is a form of supervised
learning technique in machine learning in which an algorithm is trained on a labeled datast
to predict the class or category of fresh, unseen data..
The primary goal of classification is to create a model capable of accurately assigning *
label or category to a new observation based on its properties. For example, a classification
medel might be trained on a dataset of images labeled as either dogs or cats and then used
to predict the class of new, unseen images of dogs or cats based on their features such #
color, texture, and shape.
When the output variable is a category, such as "red" or "blue" or "disease" and '™
disease," the problem is called a classification problem. A classification model makes aoa Learang | Page 3.5
ioe derive some conclusion from observable values. Given one or more inputs a
‘on model will try to, predict the value of one or more outcomes. For example,
ing emails “spam” or “not spam", when looking at transaction data,
, or “authorized”.
    
  
jassification algorithm's main purpose is to determine the category of a given
these algorithms are primarily used to predict the output for categorical data.
mm below can help you better understand classification methods. There are
asses in the diagram below: class A and class B. These classes have features that are
two s toeach other and dissimilar to other classes.
sii
@ Class
 
Fig.2.3 Example of Classification
A classifier is the algorithm that performs classification on a dataset. There are two types
of Classifications: :
+ Binary Classifier: If the classification problem has only two possible outcomes, then
itis called as Binary Classifier. Examples: YES or NO, MALE or FEMALE, SPAM or
NOTSPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than two outcomes, then
itis called as Multi-class Classifier. Example: Classifications ‘of types of crops,
Classification of types of music.
 
 
Learners in Classification Problems:
Inthe classification Problems, there are two types of learners:
1 \ | atlas a
Lary Leamers: Lazy Learner firstly stores the training dataset and wait witil it
Teceives the test dataset. . In the case of the lazy learner, classification is performed
a
@ scanned with OKEN Scannerusing the most related data contained in the training dataset, ~
time, but predictions take longer.
Example: K-NN algorithm, Case-based reasoning
2. Eager Leamers: Eager Learners develop a classification mode, —
dataset before receiving a test dataset. In contrast to Lazy Learners tat
@ scanned with OKEN Scanner
devote more effort to learning and less time to prediction, Example, Eager uy
Naive Bayes, ANN. inn
3.5 CLASSIFICATION LEARNING STEPS
The basic principle behind classification is to train a model on a labeled “<
input data coupled with their matching output labels in order to discove, ie oe
relationships between the input data and output labels. Once trained, the mod, Patina
to predict output labels for previously unknown data. Sean
The classification process typically involves the following steps:
1. Define the problem: The first step in applying classification is to Precis,
the problem and the desired outcome. What class labels are you eet
predict? What is the relationship between the input data and the cla a
Suppose we have to predict whether a patient has a certain disease or se =
basis of 7 independent variables, called features. This Means, there cntesne
Possible outcomes: -
° The patient has the disease, which means “True”.
* The patient has no disease. which means “False”,
This is a binary classification Problem.
. Feature Extraction: io
Pn n: The relevant features or attributes are extracted frome 4
Pan ee pe ‘0 differentiate between the different classes, Suppose our
‘Pendent features, having only 5 features influencing the Inbal of[EERA
pearning
an es remaining 2 are negligibly or not correlated, then we will use only these 5
: oa ~ sonly fr the model training,
fea!
4, choose an algorithm: There are numerous algorithms and strategies for machine
* eaening each with its own set of advantages and disadvantages. Algorithms that
sre commonly used include logistic regression, decision trees, support vector
machines GVM), and neural networks. The algorithm you choose will be
etermined by the nature of the problem you are attempting to answer as well as
the qualities of the data you are working with,
sain the model: Following the algorithm selection, the model i trained using the
raining data. This involves feeding the data into the algorithm and allowing it to
team the relationships and pattems in the data. During the training process, the
model's internal parameters will be adjusted to minimize the difference between
the predicted and actual output.
Evaluate the model: After the model has been trained, it is critical to assess its
performance in order to determine how well it can solve the problem. This can be
accomplished by the use of several evaluation criteria’such as accuracy, precision,
or recall. If necessary, the model can be fine-tuned or updated to increase its
performance.
Fine-tuning the model: If the model's performance is not satisfactory, you can
fine-tune it by adjusting the parameters, or trying a different model.
Deploying the model: Finally, after we are satisfied with the model's performance,
we may use it to generate predictions on new data. It can be used to real-world
ey
2
problems.
 
.3.6. K-NEAREST NEIGHBOR (KNN) Ee
+ KNearest Neighbor is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
* ‘The K-NN method assumes similarity between the new case/data and existing cases
and places the new case in the category that is most similar to the existing categories.
* The KN algorithm maintains all existing data and classifies a new data point
‘sed on the similarity. This means that when fresh data is generated, it may be
‘wickly classified into a well-suited category using the K- NN algorithm.
@ scanned with OKEN Scanner* The K-NN algorithm can be used for both regression and classification 4, eo
.d for classification tasks. t we
'
parametric algorithm, which means it makes no assumpyig, 7
a
nth,
 
   
underlying data.
* It is also called a lazy learner algorithm because it does not learn from y,
set immediately instead it stores the dataset and at the time of classi, trai
ca
@ scanned with OKEN Scanner
“iy
ti, &
performs an action on the dataset. tion i |
it
+ KNN algorithm at the training phase just stores the dataset and when ig
data, then it classifies that data into a category that is much similar to the ngs sm
fa
Example: Assume we have an image ofa creature that resembles a cat or a dog "
want to know whether it is a cat or a dog. So, because it works on a similarity ieee Me
May utilise the KNN method for this identification. Our KNN model will find the 4."
features of the new data set to the cats and dogs images and based on the most
features it will put it in either cat or dog category. mle,
EXAMPLE
The following is an example to understand the concept of K and working of jy
algorithm —
‘Suppose we have a dataset which can be plotted’as follows -
Fed Col Dow
  
Blue Color Dots
Now, we need to classify new data point with black dt (at point 60,60) into blues
class. We are assuming K = 3 ie. it would find three nearest data points.
are EEE
yearning
wn nin te next diagram:
.
01
40
 
 
fo 20 9 40 80 OOS
ee closest neighbors of the data point are marked with a black dot in the diagram
The the :
spove. Because two of the three are in the Red class, the black dot will also be assigned to
the Red class.
3.6.1 K-Nearest Neighbor Algorithm Pseudocode
programming languages like Python and R are used to implement the KNN algorithm.
the following is the pseudocode for KNN:
1. Load the data
2. Choose K value
3, For each data point in the data:
«Find the Euclidean distance to all training data samples
* Store the distances on an ordered list and sort it
* Choose the top K entries from the sorted list
© Label the test point based on the majority of classes present in the selected
points
4, End
3.6.2 Few ideas on picking a value for ‘K’
* There is no organized approach for determining the best "K" value,
experiment with different values, assuming that the training data is unknown.
We must
ae
@ scanned with OKEN Scanner=
EEO ii
Choosing smaller numbers for K can be noisier and have a grate
outcome. inpag
© Higher K values result in smoother decision boundaries, Fesulting in bs %
but increased bias. Additionally, computationally expensive, lower i,
‘Scanned with OKEN Scanner
@
+ The value of K can be chosen though cross-validation. Take the smay
the training dataset and call ita validation dataset, and then use the sane ton h
; ry
different possible values of K. This way we are going to predict the ap ea
instance in the validation set using with K equals to 1, K equals to 2 K org, *
and then we lok at what value of K gives us the best performance on gb
set and then we can take that value and use that as the final setting of o  Valida
50 we are minimizing the validation error . Mt gor
+ In general, practice, choosing the value of kisk = sqrt(N) where Ng
the number of samples in your training dataset. ands jg,
* Try and keep the value of k odd in order to avoid confusion between two
data. S065
3.6.3 Distance Metrics Used in KNN Algorithm
AAs we know, the KNN algorithm can assist us find the closest points or groups
query point. However, we need some metrics to find the closest groups or points to a Fs
point.
We use the distance metrics listed below for this purpose:
* Euclidean Distance
© Manhattan Distance
¢ Minkowski Distance
1. Euclidean Distance
This is simply the cartesian distance between two locations in the plane/hyperplane.
Euclidean distance can also be represented as the length of the straight line connecting th
two points under examination. This metric allows us to compute the net displacemert
between two states of an object.ing Cae
wn
Euclidean
©
 
‘©
 
 
 
p.shanhattam Distance
3 sis distance measure is‘ commonly utilized when the total distance travelled by the
is more important than the displacement, This metric is computed by adding the
ee differences between the coordinates of points inn dimensions,
abso!
*
ax,y)= B \xi—yil
fe
 
Manhattan
O
 
 
 
We can say that the Euclidean, as well as the Manhattan distance, are special cases of the
Minkowski distance.
 
From the formula above we can say that when p=2 then it is the same as the formula for
oy ldean distance and when p = 1 then we obtain the formula for the Manhattan
istance,
>
@ scanned with OKEN ScannerMinkowek!
f
a
@ scanned with OKEN Scanner
 
 
oN
3.6.4 Example on KNN Algorithm
The table below represents our data set. We have two columns»
and Saturation. Each row in the table has a class of either Red or Blue. Before 5°"!
new data entry, let's assume the value of K is 5.
 
We ittrodya
 
 
 
 
 
 
   
 
 
 
 
 
 
1
2 40
3 50
4 0
5 10 — [Tred 17
[BU siusmo tos ees aes
7 60 10 Red_| |
8 25) set BO “Blue _|
rs i
 
 
Here's the new data
 
 
 
 
 
 
 
We have a new entry but it doesn't have a class yet. To know its class, we have!
calculate the distance from the new entry to other entries in the data set using the Euclides*
distance formula.yeorntnd ETERER
en oemala: WXEXI + FY)
e
's brightness (20).
 
x2 2 New entry’
xe Existing entry's brightness.
Y2 =New entry's saturation (35).
ys = Existing entry's saturation.
ets 40 the calculation together. I'll calculate the first three.
tance #2
ase es" ol
BRIGHTNESS SATURATION CLASS.
 
dl =¥(20- 40)" + (35 - 20)°
= ¥400 +225
= 1625
=25
Wenow know the distance from the new data entry to the first entry in the table. Let's
update the table.
  
 
 
 
 
 
 
 
 
@ scanned with OKEN ScannerEX
Distance #2
For the second row, d2:
     
  
50 [25250
jets
d2 =1(20 - 50)* + (85 - 50)"
= 900 +225
=V1125
= 33.54
Here's the table with the updated distance:
BRIGHTNESS SATURATION CLASS DISTANCE
 
 
 
 
 
 
 
 
 
 
 
 
At this point, you should understand how the calculation works. Attempt
the distance for the last five rows. Here's what th
shave been calculated:
 
to calculate
e table will look like after all the distances
 
 
scr sui ass
 
 
Saturation
 
 
 
 
 
 
 
 
 
 
 
@ scanned with OKEN Scannerchose 5as the value of K, we'll only consider the first five rows. That is:
we
since
 
ee above, the majority class within the 5 nearest neighbors to the new entry
syou can see above, the
gel Terfore, well classify thesnew entry as Red,
isked.
ser’ the updated table:
  
 
3.6.5 Advantages of K-NN Algorithm
Here are some of the advantages of using the k-nearest neighbors algorithm:
* Its easy to understand and simple to implement.
Itcan be used for both classification and regression problems.
Because there are no assumptions about the underlying data, it is appropriate for
Non-linear data.
  
naturally capable of handling multi-class instances.
'tcan perform well with enough representative data.
@ scanned with OKEN ScannerIN Algorithm 8y
pos tages FN ‘using the k-nearest no} ~
ava ptages of USB Neighbor,
disadva” Ss
he 7 : a
“sig high as it stores all the train Ig6,
cost is hig! ining da Bor;
   
« Sensiti
cations of eed
3.6.7 Appl sno jearning, classification is an importa
science and ma
d mos!
ve one of the oldest and MOS”
is one ome applicati
modeling. Here are st a ee :
Credit rating: The KNN algorithm assists in determining on
i ig them to others who share similar traits,
rating by comparin :
Loan approval: ‘The k-nearest neighbour method, like credit Satis
detecting individuals who are more likely to default on loans by — ig
qualities with similar persons. mPa
1g: Datasets can have many missing values. The
which estimates missing values, NN Mey,
+ accurate algorithms for pattern reco ale
ions for the k-nearest neighbor alg otithn ang My |
ek
‘%
%
eld
A
Indata
© Data preprocessin,
used for missing data imputation,
Pattern recognition: The KNN algorithm's capacity to recognize Pate
wide range of applications. It can, for example, recognize pattems MTS ley,
usage and identify anomalous patterns. Pattern recognition can ae Cet
pattems in client purchasing behavior. Used ing
* Stock price prediction: Because the KNN algorithm is good at estimatin, theng
of unknown entities, it can be used to forecast the future value of Aad a
historical data. “
+ Recommendation systems: KNN can be used in recommendation systems =|
can help locate people with comparable traits. It can, for example, be used al
online video streaming platform to recommend content that a user is more le},
view based on what similar users watch. |
© Computer vision: ‘ 3
i S, 2 = — ae KNN algorithm is used for image classification. Si’
pay ae ‘ouping similar data points, for example, grouping cats togete’
liffer i s :
ent class, it’s useful in several computer vision applications.
 
@ scanned with OKEN ScannerOr tical implementation of KNN Algorithm mig see
Aes “Learn
prac! iris dataset, which is one Of the Widely Used latasets
36 vg use i dataset is included in R base and Pyt
Oe ei ct so that users can access it without hay;
iho ear,
io eit a can be imported from Sklearn dat,
i eas ick
w is a we can use for practicing,
1s!
 
for learning ML
the Machine
ing to find a Source for it
Asets,
learning
     
 
 
which Contains Numerous
      
    
 
    
      
 
 
   
    
 
 
ae sklearn.datasets import load_iris
:| fee = load iris()
iri
the data and target value into two separate Variables,
tore
wes
+ pandas as pd i
3 | amr DataFrame(iris.data, columss oris feature nanes)
ae er putabraoa(ieda targets
vere
print (x)
output as:
weget the utp! cise PRET Teng wa
‘Sepe 5.1 :
6 .
1 K
2 .
3 H “ at
4 8 5.2 23
Es 6.7 ae 6 Ls
us 6.3 3.0 5.2 2.0
145 6.5 3.4 5.4 2.3
uy 6.2 te 5.4 1.8
148 ats 3.
49
[350 rows x 4 columns]
e
ee
1 @
20
3 0
48
us 2
M6 2
17 2
M8 2
14g 2
a
© scanned with OKEN ScannerSupervise
L Peze 3.18 | ‘nny
.d to split the iris dataset into t.:
In order to train and test the model, we nee
datasets,
    
split
 
: [from sklearn.nodel_selection inport train_test_s
 
 
i i i
xctrain,x.test,y train y_test = train test_split(nystest.s
  
   
 
To check the shape of training and testing data, write the code below:
    
 
 
 
[8]: | print(x_train. shape)
print{x_test.shape)]
print(y_train. shape)
print(y_test. shape)
 
  
We get the output as:
(128, 4)
(30, 4)
(128, 1)
(38, 1)
 
Next, we have to build the model. Here is the code:
[16]: | from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
 
 
knn.fit(x_train,y_ train)
 
 
In our example we are creating an instance (‘kn’) of the das
‘KNeighborsClassifer’, in other words we have created an object called ‘knn’ whid
knows how to do KNN classification once we provide the data. The Paramete,
‘n_neighbors’ is the tuning parameter/hyper parameter (k).
Hyperparameters in machine learning are preset settings or configurations that gover
the learning process of a model. Unlike parameters, they're not learned from data bi
selected beforehand, influencing the model's behavior, performance, and learning rt
Hyperparameters are chosen through techniques like grid search, random search, &
© scanned with OKEN Scanneryearning
re pimization. We have set «
_
R_Neighbora: to
ing the optimal value of K is critica),
sist set. To do this, we will use the “,
e ful see how well our model pre,
Now let’s Se
Core’ functi
: io ;
dictions m, n and pass j
curate our model ig
a
*ch Up to the actual
N Our test input and
results,
  
  
th
of to
staat
onto
 
model has an accuracy of approximat
ely 1009
jgnbore’ 1031s correct. ¥ 100%. It means that the setting
‘nse!
3 model is trained, we can use the preai, :
nee the a Ct () function on
fi t data. The " ‘ our model to mak:
gions on our test Predict’ method is used to test the model on testing sas
w.test)
(23): y_pred = knn.predicti(x test]
‘A dassification report is a performance evaluation metric in machine learning. It is used
tp show the precision, recall, F1 Score, and support of your trained classification model,
A Classification report is used to measure the quality of Predictions from a classification
algorithm. How many predictions are True and how many are False. More specifically, True
Positives, False Positives, True negatives and False Negatives are used to Predict the metrics
ofa classification report as shown below.
  
 
Precision is the ability of a classifier not to label an instance positive that is actually
negative.
Recall is the ability of a classifier to find all positive instances. For each class it is
defined as the ratio of true positives to the sum of true positives and false negatives.
The F1 score is a weighted harmonic mean of precision and recall such that the best
score is 1.0 and the worst is 0.0.
:| from sklearn.metrics import classification_report
print{(classification_report(y_test,y_pred)))
 
 
© scanned with OKEN ScannerEE Sag
We get the output as: ‘y,
precision recall f1-score
e 1.06 1.00 1.00 1.
1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 4
1.00 3e
accuracy
macro avg 1.00 1.00 1.06 5
weighted avg 1.08 1.00 1.00 =
 
Now we will create the Confusion Matrix for our K-NN model to see the a,
classifier. Plot the confusion matrix of the true test labels y test ang the “Yofg,
labels y_pred. Below is the code for it: Pre,
    
  
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y pred)
print{cm)]
 
   
 
[29]:
 
 
 
In above code, we have imported the confusion_matrix function and called;
variable cm. itusing
  
 
  
[[1e @ @]
[fe 9 @]
[@ 6 11}]
  
In the above image, we can see there are 1019111 30 correct Predictions and 0
redicti ill visualis i i ua
Be a none. Now, we will visualize the confusion matrix for K-NN model. Write the cae
pP
[33]: | %matplotlib inline
inport matplotlib.pyplot as plt
import seaborn as sn
plt, Figure(figsize=( 7,5))
sn.heatmap(cm, annot=True)
plt.xlabel/( "Predicted! )
plt.ylabel( ‘Truth’ )
  
 
 
    
  
  
  
  
 
 
 
 
© scanned with OKEN Scannerfou 7
output as:
ge
 
© We 29222222, 0.5, Truth’)
 
Predicted
[Ei THE DECISION TREE MODEL OF LEARNING
edsion Tree is a supervised learning technique that may be used to solve
ctsscation and regression problems; however itis most commonly employed to
solve classification problems. Itis a tree-structured classifier in which internal nodes
contain dataset attributes, branches represent decision rules, and each leaf node
represents the result.
‘A Decision tree has two nodes: the Decision Node and the Leaf Node. Decision nodes
are used to make decisions and have numerous branches, whereas Leaf nodes
represent the results of those decisions and do not have any additional branches.
‘The decisions or the test are performed on the basis of features of the given dataset.
\tis a graphical representation of all possible solutions to a problem/decision given
certain conditions,
© scanned with OKEN ScannerEa SuPer,
1. ¢, like a tree, it begins with ed
*  Itis called a decision tree because, the TOOL no, mn,
branches out to form a tree-like structure. leorth ‘ Stang
In order to build a tree, we use the CART algorithm, which stangs . he,
and Regression Tree algorithm, “Sit
* A decision tree simply asks a question, and based on the answer (Veg Hs,
0),
Split the tree into sub-trees. . » it fay,
Below diagram explains the general structure of a decision tree:
  
EYE Wa
 
 
Root Node
 
 
‘Decision Node
 
 
1 Sub-Tree |Decision Node
 
   
 
 
 
 
Leat Noas
Fig. 3.4 Structure of Decision Tree
Example: The below binary tree can be used to explain an example of a decision
Assume you want to forecast whether a person is fit based on variables Such as age, ei
habits, physical activity, and so on. Questions like 'What is his age’, Does he exercise?
‘Does he consume a lot of pizzas?’ are decision nodes here. The Teaves, which are eth.
or ‘unfit’ are outcomes. In this case this was a binary classification problem (@ yes notype
problem).
 
Yes 2. No?
Eatsalot Exercises in
-Ofpizzas? the morning
mt \ sore /\ 0s
Unit Fit Fit Unfit
 
 
 
© scanned with OKEN Scannerning
7 Decision Trees?
ust
re Dr
yor 5 are a popular machine learn,
aon tree
8 algorithm,
of ‘on tasks. Here are some reasons why decision t
jon
ior d easy to interpret: Decision
r an ive an
uit applications because they give
relationships and generate pred,
" :
ally represented, making them
ip!
 
 
for both
lassificat
Aeoften used, “ton
Tees are an oy
Cellent choi 2
@ straightforward O'Ce for decision.
5 @Pproach 4
ctions based on current f 0 describe
; ‘acts. They can Tf
easi =
cn ter to grasp and Convey to, others,
be tile: Decision trees are useful in machine learnin,
versa
: 8 because th
th classification and regression tasks. Thy
for bot
°Y can be used to si
including those in healthcare, finance, and marketing,
issues
er ey, : roach, which ‘Means they
‘i "= Gees them applicable to a wide
make of data kinds and distributions. In contrast, Parameic appanage
gna specific data distribution,
maki
Y may be used
olve a variety of
t: Decision trees are robust to outliers and noi
st: ; -
ee Gaia Wine a neat imputation. This
4 5
ene predictions even when data is not perfectly
accuré
ise in the data, and can handle
S means they can still produce
clean.
ble: To handle enormous datasets and difficult challen,
a rae up. There are various decision tree Variants, s1
as ue that can improve their performance and scalability.
boosting,
'Bes, decision trees can be
uch as random forests and
|
4.7.2 Decision Tree Terminologies
it Node: The decision tree begins at the root node. It Tepresents the full dataset,
Root Node:
‘hich is partitioned further into two or more homogeneous sets,
wi
if Node: Leaf nodes are the tree's final output node, and the tree cannot be further
Leaf Node:
separated after obtaining a leaf node.
; ry fe
Splitting: Splitting is the process of separating the decision node/root node into su
nodes based on the conditions specified.
Branch/Sub Tree: A tree formed by’ splitting the tree.
from the tree.
Pruning: Pruning is the process of removing the unwanted branches fro
de, and other
ParenlChild node: The root node of the tree is called the parent node,
nodes are called the child nodes.
 
© scanned with OKEN ScannerDy |
Supervieeg a
f
"4
aa
ision Tree Algorithm :
node and works its way up ¢,
3.7.3 Working of Dec
Ina decision tree, the set ace te values of the root attribute wine
vehses thos eo aa aa attribute and then follows the branch and jumps bb i
next node based on the comparison.
The algorithm compares the attribute vé
the next node. It repeats the process until
algorithm can help you better ‘understand the entire process:
5, which contains
*  Step-1: Begin the tree with the root node, says the Compigy
lue with the other sub-rodes and move,
it reaches the tree's leaf node. The fos
. ope “me the best attribute in the dataset using Attribute Selection Measure (As,
© Step-3: Divide the $ into subsets that contains possible values for the best attebuteg
© Step-4: Generate the decision tree node, which contains the best attribute,
© Step-5: Recursively make new decision trees using the subsets of the dataset Creag
in step-3. Continue this process until a stage is reached where you cannot fury,
classify the nodes and called the final node as a leaf node.
 
 
3.7.4 Attribute Selection Measures
The biggest challenge that arises while developing a Decision tree is how to select thy
best attribute for the root node and sub-nodes. To tackle such challenges, a technique known
as Attribute Selection Measure, or ASM, is used. We can easily determine the bat
characteristic for the tree's nodes using this measurement. The popular technique for ASM,
is Information gain:
¢ Information gain is the measurement of changes in entropy after the segmentation
a dataset based on an attribute.
¢ It calculates how much information a feature provides us about a class.
* According to the value of information gain, we split the node and build the decison
tree.
A decision tree algorithm will always try to maximize the value of information gai
and the node/attribute with the most information gain will be split first.
Information gain is denoted by IGS, A) for a set $ is the effective change in ent!
ifter deciding on a particular attribute A, It measures the relative change in entropy wit
2spect to the independent variables,
© scanned with OKEN Scanner. calculated using the below formuta:
100 G(s, A) =HS)- HS, A) ny
ae natively’
x
IG, A) = H(S) ~ & P(x)* A(x)
ie ility of event x.
ott . the probability 0}
gutropy also called Svanmen Entropy is denoted by HG) for a fins
os of the amount of uncertainty or randomness in dats a finite set Sis the
3% 1
HG) = P(x)logy pe)
ittells us about the predictability of a specific event
3 ds and a 0.5 probability of tai ;
pility of hea PI ty of tails. Because there is :
Me wil happen, the entropy is as high asi canbe. Cons no means of knowing
- Consider a coin with hi
ves the entropy of such an event can be fully anticipated because we ine oe
e
gatit wil always be heads. In other words, because this event has no unpredcabity ve
entropy is 0. Lower numbers indicate less uncertainty, whereas larger values eae
greater uncertainty.
3.7.5 Pruning in Decision Tree
Pruning isa strategy for reducing the complexity of decision trees by deleting branches
that are unlikely to increase the tree's accuracy on unseen data. Pruning can assist prevent
overfitting, which occurs when the tree gets too complicated and closely matches the
training data, resulting in poor generalization to new data.
Consider a coin toss with a 05
There are two main types of pruning:
* Pre-pruning: This involves setting a limit on the depth of the tree, the minimum
number of instances required in a leaf, or the minimum information gain required
for a split. Pre-pruning is often used when the dataset is large or noisy, and when
building a large tree would be computationally expensive or lead to overfitting.
Post-pruning: This involves building the full decision tree and a removing,
branches that do not improve the accuracy of the tree on a validation set. ses
Pruning is often used when the dataset is small or clean, and when building 2515
tee is not computationally expensive.
© scanned with OKEN Scanner“om
    
gree ider a 14-day period of
cist this. Cone a
0 0 ty i ea i
ete, Huu create 2 prediction model that
ist ‘ill De played that day. To accom 4
seat at dO oe whether 8° tree to do that Mey
   
 
BRE
=
 
|
|
|
|
We will perform following tasks recursively:
1. Create oot node fr the tree-
4 tall examples are postive, retum leaf node ‘positive’.
4. Els ifall examples are negative, return leaf node ‘negative’.
4. Calculate the entropy of current state H(S).
5. For each attribute calculate the entropy with respect to the attribute ‘x’ denoted ly
HS,x).
6 Select the attribute which has maximum value of IG(S, x).
F Pt oF attribute that offers highest IG from the set of attributes
ee es Se ie ete eee
let's go and gow th decision tre, The inital step i to calculate 8.
Enttopy ofthe curen
Ye, "state. Inthe above example, we can see in total there are 5 Nos
4
© scanned with OKEN Scannert parr”
Ye
f” se ne Esa my
:
. logy ——
gpiop) * 2 POMIOB2 TS
Gon(d) Sal
nto) “~ (348? (ag 3) og (3)
=0.940
thatthe Entropy is 0 if all members belon
Pr uel to one class and other half belong to other class indicat
0 OO tvs 0.9 indicating that the distribution ¢ * indicating complete
é , Here : : 18 Teasonably rando
on choose the attribute that gives us highest possible Informe NOW: the
ae? : ae root node. Let’s start with ‘Wind’, formation Gain which
ef
8 to the same class, and 1 When half
10s. wind) = H(S) oe P(x)xH(x)
ig are the possible values for an attribute. Here, attribute
Wind’ takes tw
seuss the sample data, hence x= (Weak, Strong). We'll have to calculate:
1, Hees)
1, He)
3 FG)
4, PGane)
5. H§)=0%4 which we had already calculated in the previous example
Anonsall the 14 examples we have 8 places where the wind is weak and 6 where the
wwadisStrong.
 
Wind = Weak | Wind = Strong. Total
8 6 4
~ Number of Weak
Total
 
 
 
 
P(Sweas)
28
14
Number of Strong
PSereng) = Total
Ba
© scanned with OKEN Scannerty
*¥es' for Play Golf and 2 o¢ then
cee
“iw
Weak examples,
we have, 2
6 (Z)io 3)
Entropy(Sees) * -{§}ose (3 Eke
=0811
we have 3 examples where the outcome ag e
amples,
Strong No’ for Pay Cole
6 of them were .
"
Now, out of the 8
*No' for ‘Play Golf’. So,
Similarly, out of 6 z
for Play Golf and 3 where we had No!
moa --po( m6)
=1.000
ile other half belongs to other
items bel to one dass while : ; :
Remember, here half items ens ‘we have all the pieces required to calculate i:
we have perfect randomness. No
Information Gain,
*
ind) = HUS) - 3: P(2)* H(@)
IG(S, Wind =H(8)- %, PO)
= H(6) —P(Seas) * H(Sons) — P(Surong) * H(Ssrmgt)
: 3 é] 1.00)
-os10- [Jost (é (1.00)
= 0.048
IG (5, Wind)
It tells us the Information Gain by considering ‘Wind’ as the feature and give ys
information gain of .048. Now we must similarly calculate the Information Gain forall te
features.
IG(S, Outlook) = 0.246
IG(S, Temperature) = 0.029
IG(S, Humidity) = 0.151
IG(S, Wind) = 0.048 (Previous example)
We can clearly see that IG(S, Outlook) has the highest information gain of
0.246, hence we chose Outlook attribute as the root node. At this point, the decision tee
looks like.
© scanned with OKEN Scannerje that whenever the outlook 2 Overcast, Play Golf is always ‘Yes’, its no
re ob se ae the ue tree resulted because of the highest information gain
oy bute Outlocs
eae Outlook, we've got three of them remaining Humidity
Opal ee ‘And, we had three possible values of Outlook: Sunny, Overcast,
wt orovercast node already ended up having leaf node ‘Yes’, so we're left with
Oh pee ne te; Sunny and Rain. |
310 i
sabe computing H(Seusrs). /
— tlook is Sunny looks like:
ot ne value of OW
   
 
 
   
t6suny)=(2)!982 (2)-(3) loge (3)- 096
thesimilar fashion, we compute the following values
hi
[0S.ny Humidity) = 0.96
{GlSsey Temperature) =
16Gexy Wind) = 0.019
As we can see the highest Information Gain is given by Humidity. Proceeding in the
sme way with Sua will give us Wind as the one with highest information gain. The final
Iiion Tree looks something like this.
1.57,
 
© scanned with OKEN Scanner3.
whether golf will be played that day.
d the dataset with pandas:
First, open the Jupyter notebook and rea
 
[2]: | import pandas
stro Weak
No
 
al Implementation of Decision Tree using Scikit is
7 Practic:
in which the features are Outloox.
d of data in wl
Consider a 14-day periov
fs the outcome variable is whether golf was played that a
Humidity, Windy, eh model that takes the above four characteristicy and ge
now is to create a pl
df = pandas.read_csv("d:/golf-dataset.dsv")
print(df)
Yes
Tr
rhea,
 
 
|
The above code read golf-dataset.csv file stored in D: drive and print the data set,
Outlook Temp Humidity
je Rainy Hot High
7 Rainy Hot High
2 Overcast Hot High
3 Sunny Mild High
4 Sunny Cool Normal
5 Sunny Cool Normal
6
7
8
Ig
 
Overcast Cool Normal
Rainy Mild High
Rainy Cool Normal
Sunny Hild — Nomad
@ Rainy Mild Normal
21 Overcast midg High
Overcast Hot Nonath
Sunny mig High
=
 
E
8
  
o
Windy Play Golf
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
No
    
        
      
       
  
 
   
          
 
 
   
     
 
 
 
  
     
 
 
 
 
 
  
© scanned with OKEN Scannerno
yo
yr ce ithe table:
tar Wm,
mn is the Days where 0 is the
de® di
yu yal
cA | isda
fre irs column is Outlook. We have three typ a Sl 90 on,
s, es
"pe He esasl and Sunny. Of Values in this feat,
te i,
‘i gotun Temp: We have thre :
rd ©
 
types of y; in this feature ,
OF Values in this re i.e, Hi
fot, Mild
a umn is Humidity. We have two
rth col ‘yPes of values in this feature
ie. High
at column is Play Golf. We have two values ie ‘Yes or No,
nels 4
ecsion tee all data has to be numerical,
a
fo merical coh 4 ie
convert the non nut columns ‘outlook’, Temp’, "Humi ity’, Windy
t
we id into numerical values.
ny
pps
a0 method that takes a dictionary with information on how to convert
      
   
convert the values ‘Rainy’ to 0, ‘Overcast’ to 1, and ‘Sunny’ to 2.
seas
wis run the above concept:
   
  
   
 
 
d= {‘Rainy’: @, ‘Overcast’: 1, ‘Sunny': 2}
df['Outlook'] = df['Outlook" ].map{d)
print (df)
 
 
Mee the output as follows where the content of the Outlook column is replaced with
“ped numerical values:
 
© scanned with OKEN ScannerHumidity Windy pia
K Temp igh = Weak Y S01¢
           
     
         
     
       
    
      
 
Hot
High Strong
: . ee high Weak ye
2 2 Mild High Weak yes
3 ; cool Normal Weak ven
4 2 cool Normal Strong Rs
7 cool Normal Strong a8
& 3 mild = High Weak he
if @ cool Normal Weak y 0
: 2 Mild Normal — Weak va
e @ Mild Normal Strong Yes
10 1 Mild High Strong Wie
11 1 Hot Normal Weak Yes
es 2 Mild High Strong No
 
 
wwe do the same thing for other columns also. We run the following cog
le;
 
 
No!
[31]: | d = {'Hot': @, ‘Mild’: 1, ‘Cool': 2}
df['Temp'] = df['Temp'].map(d)
"Normal': 13
d = f'High': @,
df['Humidity'] = df['Humidity'].map(q)
d = {'Strong': @, 'Weak': 1}
df[‘Windy'] = df[ 'Windy'].map(d)
d= {'Yes': 1, 'No': @}
df[ ‘Play Golf'] = df['Play Golf'].map(d)
print (df)
© scanned with OKEN Scanneryon’
it aS foll
tp llows. Now all the Nonny,
e Numer
s lcal
4 on lve Values hag rE
fl
the Tem| > a
Va ter : Humidity Windy "paced
f e 4 Play Golf
    
         
 
@ e
e ° 8 e
4 1 a
1 1 8
1 2 as : °
5 2 2 : :
A 2 2 : ‘
8
5 a 2 : ° :
§ e 1 . |
1
7 @ 2 : :
1
8 a. 4 : : :
: @ 1 i Fi
10 1 e
ih 1 8 8 :
p 1 8 ; : 2
8 Bye t @ 5 ,
sen we have to separate the. - feature columns from the target column.
rate columns are the columns that we try to predict from, and the target column
be column with the values we try to predict.
i
sr}:|features = ["Outlook', ‘Temp’, ‘Hunddity', ‘wihdy*]
 
X = df[features]
y = df['Play Golf']
print(xX)
print(y)
 
 
 
 
© scanned with OKEN Scannerciuvduvucceced
SPRoPHKHOerRHESOSS
SPESHHHSSHHEHON
veeos
NER ONeSeHENN °
wnunre
Beara
wenon 3
BReeVaue Sse
BS
8
SRR EHH oH OHHH OS
B
 
Now we can create the actual decisi
ion tree, fit it with our details, Start Dy importing!
modules we need:
    
[33]: | from skleam import thee
from sklearn.tree
Anport natplotiib.,
 
 
Amport DecisionTreeClassifier
‘Pyplot as pit \
 
   
  
dtree =
Decisiontree
dtree =
tree. Fit(y,
 
 
Classifien()
y)
  
 
 
 
 
tr
Se Plot tree(dtree, festure nanescFeatures)
   
© scanned with OKEN ScannerHumidity <= 0.6
gini = 0.459
samplos = 14
valuo = [5, 9}
 
 
Windy <= 0.5
gini = 0.245
 
 
 
 
 
 
 
 
 
 
 
ples = 1
value = [1, 0} |
 
 
 
 
‘Humidity <=0.5,
“gini= 0.459
it
samples = 14
value =[5, 9]
  
 
ch
ee cen.simeans that the days with humidity of 05 or lower will follow
terearrow (to the left), and the rest will follow the False arrow (to the right).
gus « 0.488: refers to the quality of the split, and is always a number between 0.0
sli wiew 00 would mean all of the samples got the same result, and 0.5 would mean
| faesptis done exactly in the middle.
aples = 14: means that there are 14 days at this point in the decision, which is all of
Sorts the first step.
~ J
© scanned with OKEN Scanner> |
. that of these 14 days, 5 will get a "NO", and 9 iy te
value = (5, 9]: mean: me
   
 
 
 
to play golf.
Gin ee thod i .
There are many ways to split the samples; we use the Gini method in this ty,
  
torial,
   
   
   
The Gini method uses this formula:
Gini = 1 - (x/n)? - (y/n)
iti YES"), m is the number of
Where x is the number of positive answers ("YES"), sample, a
the number of negative answers ("NO"), which gives us this calculation;
Sy
     
[2 - (9 / 13)? - (5 / 13)? = 0.459
Ee
Humidity < = 0.5
gini = 0.459
samples = 14
value = [5, 9]
     
    
 
The next step contains two boxes, one box for the days with "Humidity’ of 05 oy lov,
and one box with the rest.
True Block with samples= 7: Days Continues
Outlook <= 0.5: means that the days with a outlook value of less than 05 wa
follow the arrow to the left (which means days with rainy outlook with high
and the rest will follow the arrow to the right.
humidity),
* gini = 0.49: means that about 50% of the samples would go in one direction.
* samples = 7: means that there are 7 days left in this branch (7 days witha
Humidity less than 0.5).
* value = [4, 3]: means that
"YES" to play golf. (If humidity is less than or
humidity)
se Block with sample= 7; Days Continues
‘ i y be
“éady —=
© scanned with OKEN Scanner‘Vite
ur of which are traits of employees ;
  
    
 
 cctumns, fou ;
sre dataset contains four co salt * 100K’ column is our targot ya, © Gop
+ description and degree: dataset so that abl *
name, ob des sion assifier with ith this datas the mode - 1
Now t's ty eS eo woud be greater than 1 lakh or not depending” Py
whether the salary & f employee. “ny
ary © nd degree of emP| Pe,
, job description an %
ey pent shore! notebook and read the dataset with pandas:
First, 0
Degree Salary_GT_10ek
Yes
          
      
     
          
    
      
    
     
       
 
    
      
n
5 mE sales tanger Master
1 Google HR Manager — Master No
2 Facebook Project Hanger Bachelor Yes
3 Microsoft HR Manager ‘Master No
2 steogle Project Maneger Bachelor Yes
+ Google Project Manager Bachelor No
Facebook Sales Manager Master No
3 Facebook Project Manager Bachelor Yes
& Google Sales Manager’ Master Yes
9 Facebook Sales Manager Master No
10 Microsoft Project Manager Bachelor Yes
11 Facebook HR Manager Master No
12. Microsoft HR Manager Master No
Yes
   
Sales Manager Bachelor
    
13. Microsoft
‘As we know that the machine is unable to understand text features. So, we form a dat,
encoder to convert text-based attributes to numbers. Label Encoding is a technique that
5 . re
used to convert categorical columns into numerical ones so that they can be fitted ty
machine learning models which only take numerical data.
 
   
   
   
   
 
[2]: | from sklearn inport preprocessing
Jabel_encoder = preprocessing.LabelEncoder()
df [‘Conpany'J= label_encoder. fit_transforfa(df[' Company'
ae '20b']e ‘Label_encoder.fit_transform(df[‘Job"]) ne
incomes J label_encoder. Fit_transfora(df{ ‘Degree’ })
U'Salary_GT_1@0K"]= label_encoder.fit_transform(df[ 'Salary_GT_100K'])
 
 
 
print (df)
a eee eee ee |
 
 
© scanned with OKEN Scanneras will change the data as shown,
ch is an integer value. We got i
5 he output
een replaced with numericat re a
a
   
   
below
8
Salary_GT_100K
ye columns are
fr io the values
   
   
     
a
i a [‘Company' , Job", "Degree’]
are
oF ee getfeatures)
x cr_100k ‘fl
ssalary_6T
ye ofl
 
print (¥)
By
ba
Boy
     
 
fee: say
Slay 6T_100K, dtype: int32
‘ype of
follows, Nowe bi
the feature columns from the target column,
the columns that we try to predict from,
we try to predict.
and the target column
 
eee
© scanned with OKEN ScannerESE
We ereate a decision tree classifier and fit it against the training datas , i
criterion parameter is set to Gini. OY deg,
  
from sklearn.tree import DecisionTreeclassificy
dtree = DecisionTreeClassifier()
dtree = dtree.fittx, yl]
  
 
 
 
 
 
Now we can create the actual decision tree, fit it with our details. Start
=
modules we need: if Pt,
 
[5]: | from sklearn import tree
tree.plot_treedtree, feature_names=featuresp]
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
We get the output as follows:
po es0s
eicos
sarploa = 14
yale [7,71
‘samples
value = [4, 0]
4
Tpoasis
‘gin =0.5'
sales = 4
value = [2, 2)
v
‘ginl= 0.0,
sammlos=2
value = [0, 2]
3.8 BAYES THEOREM : 2
Bayes theorem helps to determine the probability of an event with random knowledg
It is used to calculate the probability of occurring one event while other one ali
Occurred. It is a best method to relate the condition probability and marginal probabil.
© scanned with OKEN Scannerfinancial, Bayes theorem is also extensively
ef
°Y industry, aeronautical sector, etc,
Bayes theorem is also known as the Bayes Rule
probability ofA divided by the probability of event Bie
~ PBIA)P(A)
P(AIB) = 7)
P®)
where,
(A) and P(B) are the probabilities of events A and B.
P(AIB) is the probability of event A when event B happens,
P(BIA) is the probability of event B when A happens. |
‘The different terms associated with the Bayes theorem are as follows: |
+ Conditional Probability: When the happening of an event A depends on the |
occurrence of another event B, itis known as conditional probability.
Posterior Probability: The conditional probability of an event happening based on
new information or prior probability is known as posterior probability.
Prior Probability: It is the probability of an event's occurrence based on previous
information.
Joint Probability: The chances of two or more events taking place simultaneously is
their joint probability.
* Random Variables: The continuous range of values denoting the outcome of
random experiments are the random variables.
© scanned with OKEN Scannervariable an‘
mula of conditional probability given belo, hy
Ws
ed from the
Likelihood Class Prior
7
PIBIA)PL
P(AIB)= s a) a)
pane)
_ Ae
Pal ®)= — pe) | \
Predictor Prior
‘This equation js deriv
Joint Probablity
Posterior Probability
in, Now, assume that event A is the
Teg
Pong
equation 82
So according to the equation,
tribute.
r probability of the response variable,
evidence or the probability of training data
ty is the conditional probability ofthe re
sPOnse var,
lable
Consider the previous
.d event Bis the input at!
# P(A) or Class Prior is the prior
P(B) or Predictor Prior is the
Probabili
ue given the input attributes
e of the posterior és
Probabilit
IY OF thy
+ P(AIB) or Posterior
being of a particular va
+ P(BIA) or Likelihood is basically the never
likelihood of training data
Bayes’ Theorem Example
Let us look at how the Bayes theorem probability calculator works. Assume th;
le that there
are two investment options, A and B. Then, the probability of i
1 . y generatin; iti
from A is 74% and the probability of generating positive returns from ae a
possibility of investment B providing a positive return, when investm As te
positive retum is 13%. ent A also provides
Based on the given data, determin
K ; the probability of investm ‘di
return when investment B also provides a positive return. srg tene Pie
SOLUTIO!
P(A) =0.74
P(B) =0.45
P@IA) =0.13
P (ai) = PBIAIP(A)
PB)
P(AIB) =[(0.13 « 0.74) / 045) =0.21
© scanned with OKEN Scannersxprnt earg
yes Theorem cam be written as; TERT
sa
: Prsterior = Likelitood* Prior) Eo
nee
tly i, the result P(A 1B) j
firstly, in genera ) is rete, :
a toas the prior probability, Tre t08 the posterior probability and P(A i
P(AIBk: Posterior probability,
refer
4 P(A): Prior probability.
imes P(BIA) is referred to likeli
Sometimes as the likelihood and P(B) i
PIBIA): Likelihood. (B) is referted to as the evidence.
P(B): Evidence.
.
‘This allows Bayes Theorem to be restated as:
posterior = Likelihood * Prior / Evidence
EXAMPLE: Let's take a simple example ~ Sa
ly the likelihood of aera
ifbey are over 65 years of age is 49%, O84 Person having diabetes
Now, let's assume the following:
+ Class Prior: The probability of a person stepping in the clinic being >65-year-old is
20%
+ Predictor Prior: The probability of a person stepping into the clinic having diabetes
is 35% :
What is the probability that a person is >65 years given that he has? This is Let's
calculate this with the help of Bayes’ theorem!
Likelihood Class Prior
P(patientis > 65 | Diabetes) = —“2%20_ _ pg,
Posterior Probability Predictor Prior
EXAMPLE : We can solve one example to get more understanding, Lets us consider that
an Indian soccer team acquired a new player for a new season. The team played 20 games
With the results:
Total Games Wins Losses
20 15 2
© scanned with OKEN Scannerrl
‘The new player scored in the 8
ames both when his team won or lost With sy,
the fo,
cases: 7
 
Won Games
Lost Game
10 7
 
1 of games in which the new
player scored
 
 
 
 
Question:
Find out the probability of such event that the team wins given that the —
scores. Play,
MSOLUTIO!
Let’s create table from above details:
Player scored
 
 
 
 
Not scored Tour
Won 10 5 e
2 3 ;
 
 
Lost
Total 12 8 2»
Weneed to calculate below given probability before getting ito conditional prabayy
P(Won) ty.
P(Total matches played)
5/20
75,
 
 
 
 
 
 
P(Team Wins) =
 
  
Post
EcTeamn Jos) PTO ne played)
= 5/20
=0.25
P(Ptayer cto ‘Team Wan) (os cenyeneoores)
P(Total matches won)
= 10/15
= 0.667
 
P(Player doesn't score scores)
P(Player score! Team losses) = o toal manos ay
=2/5
=04
at
© scanned with OKEN Scannerearning
on must put values in given formula where MEIER
w ea
now mn wins! Player scores) = Wins and player scored,
prem
P(Player|Team wi
s|Team Wins)+P(Team wins)
i+
ins)+P(Team wins)
}+P Player scoresi Team Tos
‘0,
ins | Player scores) = —_{275) (0.667)
wins | Play’ (0.75)(0.66) +(0.25)(0.4)
isn wins |Player scores) = 0.833402748854644
probability of Team won, and player scored in match is
= 0.8334
aan probity 83.34%.
ses)+p(Team lost)
waren
rence
 
49 BAYES THEOREM AND CONCEPT LEARNING
ae shi section, we will discuss how the concept of Bayes theorem can be used in various
oiher learning algorithms.
9.1 Naive Bayes Classifier Algorithm
3.9.
4 Naive Bayes algorithm is a supervised leaning algorithm, which is based on Bayes
sheorem and used for solving classification problems.
It is mainly used in fext classification that includes a high-dimensional training
dataset.
4 Naive Bayes Classifier is one of the simple and most effective Classification
algorithms which help in building the fast machine learning models that can make
quick predictions.
Itis a probabilistic classifier, which means it predicts on the basis of the probability
ofan object.
+ Some popular examples of Naive Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
The Naive Bayes algorithm is comprised of two words Naive and Bayes, Which can be
described as: S
* Naive: tis called Naive because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on
the bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized
© scanned with OKEN ScannerTh
berg
ly contributes to identi fy th
at
  
an apple.
aa depending ome
+ Bayest Its called Baye® be =
of Naive Bayes Class!
3.9.2 Working orking of Naive Bayes through an example. Tag
= poring ot oe calculate the Probatiy hy
if to classify whether players will play or not, based o, rs
We
cause it depends on the principle of Bayes "i
or
weather conditions
sports. Now, you nee
condition. S
ive Bayes classifier calculates the probability of an event in the folowing .
Pa S Catcae the prior probability for given class labels. teps.
© Step 1: bility with each attribute for each class
«Step 2: Find Likelihood probal
Step 3: Put these value in Bayes Formula and calculate posterior Probably
«Step 4: See which class has a higher probability, given the input jy,
higher probability class. TB lg
For simplifying prior and posterior probability calculation, you can use the
frequency and likelihood tables. Both of these tables will help you to calculate no ,
posterior probability. The Frequency table contains the occurrence of labels for aoe
There are two likelihood tables. Likelihood Table 1 is showing prior probabil fea
and Likelihood Table 2 is showing the posterior probability. 5 of ky
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[Weber [tay] Frequency Table
ae Whether [No = =
‘Sunny ___|No ae a
‘Sunny {No
[Overcast [Yes ===> Sunny |2 =
any [roe] hans
Rainy. [Yes Total e
 
 
 
 
 
 
 
 
No.
forcast
Likelihood Table 1
= k : Ukelinood Table 2
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[Sunny
Siney, Baer ll
[ay — ee Whe
tl cher
= 4a
Se Yes See + a ea] a)
mY Rain = Be
ce 3 Toa S_f2" [e514 [ose | [Overcast] [a [oso [use
peveaet [Yes Is lo g ss
Overcast |¥eg] =S4|-9n4 | ny 2 [3 ost Fs
[pean (8 Joss] pany
 
 
 
 
 
 
© scanned with OKEN Scannerearntna -
te [
fra you want to calculate the probability of playing when the weather is
er ty of pIBTIDE?
pont) = Ove | Yo) Pe) /P(Ovec .
4, cate!
provers)” 4/14 =0.29
pote) 914-088
alate Posterior Probabilities: |
7 overcast IYes) =A/9 = 0.44
put Prior and Posterior probabilities in equation (1
p ves | Overcast) = 044 * 0.64 /0.29 = 098(Higher)
culate the probability of not playing:
 
(1)
te Prior Probabilities:
similarly, you can cal
pabitty of not playing:
= P(Overcast | No) P(No) / P (Overcast) .
 
pono | overcast)
1, Calculate Prior Probabilities:
(Overcast) = 4/14= 0.29
P(No)= 5/14 =0.36
2, Calculate Posterior Probabilities:
P (Overcast 1No) = 0/9=0
4, put Prior and Posterior probabilities in equation (2)
P(No | Overcast) = 0 * 0.36 / 0.29 =0
‘he probability ofa 'Yes' class is higher. So you can determine here if the weather is
overcast than players will play the sport.
3.9.3 Advantages of Naive Bayes Classifier
and Fast: Naive Bayes is quick to train and efficient in prediction due to its
Simple
simplistic assumption of independence among features.
Effective with Small Data: Performs well with small datasets and is less prone to
overftting compared to complex models.
Handles Irrelevant Features: Can handle irrelevant features well by assuming
independence; it still performs reasonably even if some assumptions are violated.
© scanned with OKEN ScannerSy
EEE Te \ ‘Perv
Works with Categorical and Continuous Data: Accommodates both cag,
: in iecvernall arious types of datasets,
continuous data, making it versatile for various typ e Mang
il ification: Particularly effective in text classificay,
Good with Text Classification: een ony
spam filtering, sentiment analysis, and docum 8 ion: m"
* Requires Less Training Data: Needs a relatively smaller amount Of trai
estimate parameters efficiently.
*  Interpretable Results: Offers transparent and interpretable outputs,
to understand the reasoning behind predictions,
ining day
b
“making ig &
3.9.4 Disadvantages of Naive Bayes Classifier
* Assumption of Feature Independence: The classifier assumes ind, ay
features, which might not hold true in real-world scenarios, leading to inact
in predictions.
Handling Continuous Data: Naive Bayes performs less effectively with Contin,
data as it assumes a normal distribution, impacting accuracy in such cask,
Sensitive to Data Quality: It can be sensitive to noisy or irrelevant features, aff
classification accuracy. “ah
* « Zero Probability Issue: Occurrence of a feature value not present in the training day
leads to zero probability estimation, causing the entire probability to be zero.
* Limited Expressiveness: Due to its simplicity, Naive Bayes may not capture comple,
relationships among variables, limiting its ability to model intricate decison
boundaries in data.
3.9.5 Applications of Naive Bayes Classifier
* Spam Email Filtering: Naive Bayes is widely used in email services to clasiy
emails as spam or non-spam based on the presence of certain words or features.
* Text Classification: Applied in sentiment analysis, news categorization, and
document classification tasks due to its effectiveness in handling text data.
 
* Medical Diagnosis: Utilized in medical fields to assist in diagnosing diseases bast
on symptoms or patient data.
* Recommendation Systems: Used in collaborative filtering to recommend produts
or services by analyzing user preferences and iter features.
* Credit Scoring: Employed in finance for credit scoring and risk assessment bY
evaluating various factors associated with loan applications.
© scanned with OKEN Scanner| yA: APD 9 met foy
fi 8)
“
“gf RODUCTION TO REGRE gs)
nal agp PTE Macy, RE
Vie isa statistical approach sed 4, mo
' sae variables or features ang a depo. tiBate
1" ype
aft
ON ar, “s
Tring aleor, |
the peg fa
thod f Ht vaca tonahiy pn
ing as a method for pre:
rearing 2 Predi : 8 Uh en
: tive mog
ed to predict continuous outcomes, hich Sed in
. 3D algorig
cq opresion i8 an essential componen of predict, Sotithm i
chin i e
, Bde range of machine learning @Pplications, HDG it canbe
nen redicting healthcare tre T Powers Useq
sing Of P a nds, Tegressi ting finan,
s, onskey insight for decision-making on i wil
 
analysis be
ofthe key characteristics of supervised tog # Mtonihg .
"li te vale for new data by modeling dependencies ang inter  aity tg
ibe target output and input variables, Regression algorithms "Tactions
sduesbased on input features from the data fed in the system. Th the output
ae the algorithm to create a model ba ak s
sed on the attributes tandard
sense the model to forecast the value f
for new data, OF ing data ang
| Hegesion analysis, in particular, enables us
agent variable changes in relation to an independent Vatable while thease
independent variables are maintained Constant. It predicts Continuous/rea) values
suchas temperature, age, salary, and Price, among others,
BAMPLE: Assume there is a marketing firm X that TUns variou:
S advertisements
‘nugtout the year and get sales on that. The below list shows the adver
 
 
 
 
 
 
 
 
 
 
tisement made by
*enpayinthe lst 5 years and the corresponding sales:
AL Seer eee |
1 |
2.|__ Advertisement {In Lakh) Sales (In lakh)
3 Rs. 90 Rs.1000
I Rs.120 Rs.1300 E
5 Rs.150 Rs.1800 E
sf Rs.100 5.1200 b
7 | Rs.130 Rs.1380 L.
BLSee Rs.200 22?
 
 
 
/[
|
© scanned with OKEN Scanner[raze 3.52
Sue,
Now, the company wants to do the advertisement of % 209 Lakh in : Mie
wants to know the prediction about the sales for this year, S «, N
- : &
prediction problems in machine learning, we need regression analysi,
In Regression, we plot a graph between the ae Which boop nt
datapoints, using this plot, the machine learning model can make Predicgig® e
data, = a
In simple words, "Regression shows a line or curve that passes through, al ng 4,
‘
target-predictor graph in such a way that the vertical distance between the dats in
‘Points %
regression line is minimum." a :
The distance between datapoints and line tells whether a mode] has ca
ak Phuteg
relationship or not. a &y
Terminologies Related to the Regression Analysis:
* Dependent Variable: The dependant variable is the main factoy in
analysis that we wish to predict oF understand. It is also know se Rete
variable. the tg
° Independent Variable: The factors that influence the dependent Variab}
used to predict the values of the dependent variables are referred to aq Ms may
variables, sometimes known as predictors. pend,
Outliers: An outlier is a value that is either exceptionally low or Very igh vay
comparison to other observed values. An outlier may hamper the result, soit sha
be avoided.
* Multicollinearity : Multicollinearity occurs when the independent variables
substantially associated with each other but not with other factors. It should rot
included in the dataset because it causes issues when ranking the most infuns!
variable.
* Underfitting and Overfitting : If our algorithm performs well on the tai
dataset but not on the test dataset, then such problem is called Overfiting. Andifte
algorithm does not perform well even with training dataset, then such problan’
called underfitting.
Why do we use Regression Analysis?
Regression analysis, as previously said, helps in the prediction of a continuous ee
There are many instances in the real world where we need to anticipate the future,
© scanned with OKEN Scannera ds, and 80 oy
ting ten NIN they
en eutate Predictions, fy ich 6 8, Wen
a be
te Ci in machine learning ang data umstange me a
ot ysis inetude® PEN ig eg RY
nal , i, an,
ja imates he relationship joy, SS Other gy,
sion en the ta ns fg
fo "BE and 4,
1 fd the trends in data, * dehy
iw 7 redict real/continuous values,
' 10 :
Tg the regression, we can coin |
wae important factor, an ‘etermi |
fi ye east impo? id how each, factor ig ee the most tape |
a i
en ane of different approaches used in machi
eee techniques i : ine lea
te te different iq : may indude different a lear Petiem,
res diffrent types of data. Distinct types of machine vt epee
of i t fete lear;
ory assume a different relationship between the indepen regression
Sependent
>” as types of regression algorithms are
1 x
a ear Regression
spl Linea Regression
pmol Regression
+ ogi Regression
Maxim | likelihood estimation (least squares)
fl LINEAR MODELS - INTRODUCTION
Telinear model is a type of machine learning algorithm that is commonly used for
spnied leaming tasks, such as regression. It is based! on the idea of fitting a linear
‘gimtoa set of data points, which can then be used to make predictions about new data,
 
‘othe case of regression, the goal of the linear model is to find the line of best fit that
‘Sch the relationship between the independent variable(s) and the dependent variable.
“ewionora simple linear regression model can be expressed as:
ye mx +b
© scanned with OKEN ScannerEEE 8
where y is the dependent variable, x is the independent variable, ee MN
5h
ine, and b is the intercept. lone "
In the case of classification, the linear model : ~ to separate data Points; h
classes based on their features. This is done by fitting a hyperplane (a highe to 4
a i oo Nigher.g,
version ofa ine) to the data that maximizes the margin between th dtr site
885
Mh,
1
known as a linear support vector machine (SVM). :
The linear model is popular in machine learning because it ig simple h
interpret, and it can work well on large datasets. However, it may no, beg,
complex data that cannot be accurately described by a linear equation, d's Mate
complex models such as decision trees or neural networks may be needeg,"* Ste me,
In this article, we will cover two crucial linear models in machine Teaming.
(a) linear regression
(©) logistic regression
Linear regression is used for regression tasks, whereas -logistic. res
classification algorithm. s,
3.11.1 Simple Linear Regression Model
A simple linear regression algorithm models the relationship between a q
variable and a single independent variable. The relationship shown by a Simple in
Regression model is linear or a sloped straight line, hence it is called Simple ing
Regression.
The most important aspect of Simple Linear Regression is that the dependant vary
‘must be continuous/real. The independent variable, on the other hand, can be messin’
using either continuous or categorical values,
Simple Linear regression algorithm has mainly two objectives:
* Model the relationship between the two variables. Such as the relationship
between Income and expenditure, experience and Salary, etc.
* Forecasting new observations. Such as Weather forecasting accorting ®
‘emperature, Revenue of a company according to the investments in a year,
iable i A
Simple linear regression has only one x variable and one y variable in its a
‘orm. The x variable is known as the independent variable ence v independent
© scanned with OKEN Scannerr. yearning
Pa redict the dependent variable, ‘The
Ea
: Y variable j
tt ‘ Pr what you ty to predict, is the dependent variable because
ern yearby
prota nsed for simple linear regression,
sth ; :
js the predicted value of the dependent variable Wt i
t variable (x). ‘OF any given value of the
* ndependen
aisthe intercept, the predicted value of y when the xis 0,
     
Dependent Variable
   
Independent Variable
pis the regression slope coefficient how much we expect y to change as x increases.
ris the independent variable ( the variable we expect is influencing y).
‘The above graph presents the linear relationship between the output(y) variable and
predictor(X) variables. The straight line is referred to as the best fit straight line. We attempt
toplot the best line possible based on the data points provided.
To calculate best-fit line linear regression uses a traditional slope-intercept form which
isgiven below,
yi zat bxi
where yi = Dependent variable,
a =constant/Intercept,
b =Slope/Intercept,
x: = Independent variable.
This algorithm explains the linear relationship between the dependent(output) variable
Yand the independent(predictor) variable x using a straight line y= a+ bx.
© scanned with OKEN Scanner: the optimum values for a and bi
- jon algorithim seeks ‘ Mord
The near Sone with the Teast eor, which means that he OS
ues should be as little as possible. ting,
 
 
 
 
 
fit Hine. The
ees projected and acto
Find asp egessio? model for the following data:
1 x ¥
2] 1 4
3 2 7
4 3
5 4
6 5
Let the simple linear regression model be
y=athe
Steps to find a and b,
First, find the mean and covariance.
Means of x and y are given by,
—é
© scanned with OKEN Scannerofx andy, denoted by Cov(s, is dofineg 7
ye im —2\yi-7)
ta cov
ofaand b can be computed using the follow
yer cov (ed) Mowing formula
be Var @)
acy-lt
ofxand y,
the mean
pit ned
ge H{L0+20+30+40450)
=30
i = H(L00-+2.00+1.30 +375+225
= 2.06
\otfin the Covariance between x and y,
covey) = yee -F\yi-7)
cov (e,3) = Fle.0-3.0)0.00 2.06) +..+(60-80)225~206)
= 1.0625
Now ind the variance of x,
1 =?
Var) = 3x, - 3;
far (x) ae
Var (x) = i {(1.0-3.0)2 + n+ (50-307)
| =25
© scanned with OKEN Scanner“Une,
xptand coofficient® ~\
2 inert
find nT sa)
he Var(x)
Now;
  
ang lt
1,0625
be—95
= 0.425
1 =2,06- 0.425% 3.0
= 0.785
for the above example,
ted values of a and b:
the estimated regression equation ig
Cons
Therefore,
the basis of ‘the estima’
y = 0.785 + 0.425%
+ the intercept from the above equation is 0.785, Sion,
me
value of Y as a result of a unit-change ; Sy
my,
The value of
ue of the Y increases by 0.425 for each," *
Merease, in,
estimated change in the average
Slope-0.425 tells us that the average va
value of X.
3.11.2 Logistic Regression
+ Logistic regression is a common ML algorithm that is part of the s
uy
Learning technique. It is used for predicting the categorical dependent Vari
a given set of independent variables. mney
Logistic regression predicts the output of a categorical dependent vari
result, the outcome must be categorical or discrete. It can be Yes or No, 7 ae
False, and so on, but instead of presenting the exact values like 0 and 7 fas
the probability values that fall between 0 and 1. , it presens
oe linear regression, which maps input data to continuous output val
iti regression algorithm maps input data to a probability. Logistic ire r
- lels are used to predict the Probability of an event occurring, such as me
not a customer will purchase a prod: shad
Sipoener aimee as i ee The output of the logistic Tegression modd
Soruueanes - The output represents the probability thatthe
* In Logistic rey ion, in
“gression, inste: iti i
ee _ of fitting a regression line, we fit an "S" shaped logs
‘© maximum values (0 or 1).
s
BR
© scanned with OKEN Scannerthe logistic function indicates the tik !
re cancerous OF NOt, a Mouse jg ‘clihoog
Obese of not fear
ee Is 4
ne cel thing Such 4
© At se the
jon 8 8 significant machine learnin,
obabilities and classify new day, eotthm
; CAtiGe
‘a Using it
8 Continuo, and has
sion can be used to classify the observati
tions using a,
ey determine the most effective vaiaban on illern
showing the logistic function:
casi
mage
Pes of
Sed for the Classifica =
  
ofthe! logistic regression model (sigmoid function output) is always between
te utput js near to zero, the event is unlikely to occur. If the output is near to
a phe came likely to occur. For example, if the result of the logistic regression
we ved by the sigmoid function) is 0.8, it means that the probability the event
is 08 given particular set of parameters learned using cost function
on. Based Of the threshold function, the class label can said to be 1. For any new
put of the above function will be used for making the prediction.
seKtheow
saislogistic regression different from linear regression?
‘te output of linear regression is continuous and can take any oe the case of
\gieregression, the predicted outcome is discrete and restricted to a limited number of
ss
raample, assume we're attempting to apply machine learning to the sale of ahouse
‘want to forecast the sale price based on the size, year built, and number of stories, re
"pear regression, which can predict any potential value. If we want to predict
© scanned with OKEN Scannersion model and logistj,
ic ry
   
whether OF the
a jimited 10 Y faites
Hence, lineat 78
example of @ classifi
Fun
Logistic Function istamoté mathematical function that is used t
The sigmoid function 18 @ 2 conv
values into probabilities: vias my
© Itmapsany real value into another value wil a range of 0 and 1,
The logistic regression value must be bat 0 and 1, and it cannoy
Jimit, forming 2 e similar to the S" form. The Sigmoid funcg; Sey
function is another name for the $-form curve. On of | _
The concept of the threshold value is used in logistic regression
or 1. Such as values above the threshold value ting ibe ‘
Sto
"ay
probability of ether 0
the threshold values
avalue below
tends to 0.
Logistic Regression Equation
a can be obtained from the Linear Regre
ssi
a: CqUation,
The Logistic regression equatio!
The mathematical steps to get Logistic Regression equations are given bel
OW:
the straight line can be written as:
© We know the equation of
ya by +bixy + bpx2 +5%3 tot Onn
In Logistic Regression y can be between 0 and 1 only, so for this |
above equation by (1-y): —; = infi o.
(1g): Xj Ofory=0, and infinity for y=1 :
« But we infini
e need range between -|infinity] to +[infinity], then tak
, e logarithm of
the
 
equation it will become: lo; y
: tog] 2} = by +
1y Og + by xy + by Xq +323 Fone Dy Xp
The above equation i
,quation is the final equati
-quation for Logistic Re
‘egression.
Assumptions for Logistic Regressio:
n
The assumpti :
wee iptions for Logistic tegression are as folk
independent observati lows:
rvations: E:
that there is : Each observation is di
no jati ion is disti
association between any of the = — cm the others. This me
arlables.
© scanned with OKEN Scannernd variables: It assumes that the
depe,
lependent Vari;
aia
dent atitc
ies which means that it can only have two val able
a mor ip between independent y, alues,
      
 
00
i tion ;
Hee vee the independent variables rere and Io
iat? i be Hines 8 Odds of
mu
' oye shot dataset should have no outliers,
oes ee
i il - sine! The sample size is sufficiently large,
SY ef
   
sup
tt ¢ learnin;
yearning in which machin
eteea” training data, and on basis of that wes are trained
ae jabel at data, machines predit the
' ir A
a ie labeled data means some input data is already tagged with the correct
opt S ens .
sced leaning, the training data provided to the machines work as th
For that teaches the machines to predict the output correctly. It applies a
vince 252 student learns in the supervision of the teacher.
© in brief, the concept of classification.
! an
' cxssifcation js a process of categorizing data or objects into predefined classes or
& agres based OM their features or attributes. Classification is a form of
iced learning technique in machine learning in which an algorithm is
tained on labeled dataset to predict the class or category of fresh, unseen data.
te primary goal of classification is to create a model capable of accurately
asigning a label or category to a new observation based on its properties. For
eanpl a classification model might be trained on a dataset of images labeled as
‘iter dogs or cats and then used to predict the class of new, unseen images of
ogsorcats based on their features such as color, texture, and shape.
4 , 5
Miatae classification and regression in a supervised learning?
Or
Yio do classificati
“kssification and regression differ?
© scanned with OKEN ScannerAns: eee ~,
Repression 1 the tage >
ation is the task t0 predict a discrete n
is continuous quantity,
“ Prog,
cg
  
   
 
 
data is labeled
 
A regression probl
of a quantity. eM Needs ih.
7 Pred
A regression problem cont %
input variables is cate :
regression problem,
  
   
 
lass label.
Tra dassfication problem
into one of two oF MOTE lasses.
‘on having problem with two
‘dassification, and
multi-class
  
         
  
  
 
‘A dassiticati
classes is called binary mr
rove than two classes is called a mug
classification. iy
Classifying an
isan example ofa classification
ssification algorithms.
tion Algorithms to Know:
Predicting the pri
Price of a
Stock
‘email as spam or non-spam
oblem. ica Of time's
problem. | period of time is a regression
Problem, ®
 
 
 
 
Q4. List the most popular cla
‘Ans: The most important Classifica
« K-nearest neighbor (KNN)
« Decision tree
« Random Forest
Support Vector Machine (SVM)
Q5. What is k-nearest algorithm?
Ans: a Neighbor is one of the simplest Machine Learning al
2 ‘i ; algorit
a Learning technique. The K-NN method assumes pair Peela
ae te -
ao a and existing cases and places the new case in the en Pete
se existing categories. The K-NN algorithm maintain: ee
rae sifies a new data point based on the similarity. This mi eats
er i i .
K-NN gi nerated, it may be quickly classified into a i aes
algorithm. well-suited category using te
‘6 Hi rs
low the value of ‘k’ is calculated in KNN algorithm?
1s: The value of K :
thd waist a can be chosen though cross-validation, Tak th
Se ng ast and call ita validation dat . Take the small portion from
= ni possible values of K. This a a faset, and then use the same to evaluate
ce dati : i i
and = the validation set using with K ee en
We look at what value of K give equals to 1, K equals to 2, K equals 03:
Bives us the best performance on the validation
© scanned with OKEN Scanneran take that value and use that as the
7
xe validation error , final so
   
gh ting of o
PU algorithy
f°
in misanes!
ea
the cartesian distance betwee,
an "ga? ply yctidean distance can also be eh {WO Location
" 100 o, Bul Tepresented ag Sin the
ue wecting the WO ae the length of ie
.
‘i isplacement between two states of an object 1S Mettic allows ne
nel 0
ats, y) ~ YER ya
Euclidean
fas
tte avatages Of KNN algorithm?
some of the advantages of using the k-nearest neighbors algorithm:
rseasy tO understand and simple to implement.
alt
jpanbe used for both classification and regression problems.
, Because there are NO assumptions about the underlying data, it is appropriate for
roninear data.
+ htisnaturally capable of handling, multi-class instances.
+ Itcen perform well with enough representative data.
Giveay three real life applications of KNN algorithm.
tkearesome applications for the KNN algorithm:
" Gait rating: The KNN algorithm assists in determining an individual's credit
"ng by comparing them to others who share similar traits.
i ‘pproval: The KNN method, like credit ratings, is useful in detecting
oa who are more likely to default on loans by comparing their qualities
Similar persons,
© scanned with OKEN ScannerAns:
Qu.
Q12.
“Une; A
atasets CO have many missing values,
jmputation” which estimates missin, a kn ny
8 Values N ng
: hy
Data preprocessie
For issn
sain decision ee i yithm in bret
5 ervised jearning aa
robles, howe it is most common).
pica tee ett eeenn whi : erp
represent decision rul Ch inten
5; and eq, a
has two nodes: the Decisic
cision Ie
Node ang
pecan TY as
von ar 7 ession P
jon problems. ‘
putes, branches ,
sult. A Decision tree
to make decisions and have
Pumeroys wy
‘sion nodes are us
resent the results of those decisions ang
d
10 not ne
5
additional branches- |
Explain Entropy
Iso called
the amour
Entropy is denoted by H(S) fora fn
ite seg
5
as Shanno
-andomness in data.
7 is,
Entropy, al:
measure of tof uncertainty oF"
1
Zs p(x)log2——~
JP )log2 7)
predictability of a specific event. Consider a co}
05 probability of tails. Because ae 035 With ae
rowing what will happen, the enttOPY is as high as it can be. een 7
sides; the entropy of such an event can be ful savider a Coin :
that it wil lays be heads In ue “
ctability, its entropy is 0. Lower oe ok
alues indicate greater uncertainty. indict k
It tells us about the
probability of heads and @
heads on both
we know ahead of tin
event has no unpredi
uncertainty, whereas larger V:
Define Information gain.
Information gain is denoted by IG(S,A) for a set S is the
ca ake iz cae attribute A. It measures eee pth pa
independent variables. It can be anaes i
using the bi
formula:
IG(S, A) = H(S) - H(S, A)
Alternatively,
IG(S, A)= i
(GA) = HIS) ~ ¥ P(x) x HH)
& sc i .
‘anned with OKEN Scannerinformation gain by
I f, je tne in Y applying
A) ne second term calculates YP aa
le 2 © A. 15
e probability of event x, MODY afty
 
| ;
i? . trate for reducing, the complexity
vse 7 unlikely to increase the tree's accu, Of deg
: at : i a
te overitting: we oe When the tr :
' M ining data, resulting j et Tuniy
" 6 ves net ing in poor Benesaliay, £00 lca
; . ne io
o co na ES OP "onew day
ce ge TS involves setting a limit on the depth
ee af snstances required in a leaf, or the N of the tree,
it, Pre-pruning is often used whey ent ilormaton
for a SP *hen the ation pai
adi dataset Bai
af Jn building 4 large tree would be computationally /
ally expeng;
ot ves build
«ge This involves building th a
pai . ig the full decision tree and th
, ost that do not improve the accuracy of the tree on a valiant
vee used when the dataset is small or clean a valaton set. Post.
eis not computationally expensive. Y and when building a
aifferent from k-means clustering?
is
Ca vga supervised algorithm, and k-means clustering is an unsuper |
7g ort K-nearest neighbors to work, we require labeled data ey
‘ splbeled point. K-means clustering needs only a threshold and a ae
ints: the algorithm takes the unlabeled points and slowly leams how to
gid them into groups by calculating the mean of distance between the points.
5 Whats Bayes ‘Theorem?
i ys theorem helps t0 determine the probability of an event with random
svg. Itis used to calculate the probability of occurring one event while other
ce ieady occurred. It is a best method to relate the condition probability and
saa probability. In simple words, we can say that Bayes theorem helps to
csirbute more accurate results. Bayes Theorem is used to estimate the precision of
riusand provides a method for calculating the conditional probability.
4 Matis Naive Bayes Classifier?
® hteByjes algorithm is a supervised learning algorithm, which is sed on Bayes
© scanned with OKEN Scannerores
Qi7.
Q18.
Qi.
Ans:
Q2.
Ans:
Dery, 4
sification problems. IC is yay
un
 
or solving
    
theorem and used ie Fades a hist mensional training dataset ;
classification oe the simple and most effective Classification algon hy
Clasifr fone of  ine aring model ha can make uit ny Hy
help in building classifier, which means it predicts on the basis of the Pegs
isa prota cas mples of Naive Bayes Algorithm apg PM
an object. Some P' a classifying articles. Pam fg
imental analysis,
Sentimental analys el
u mean by linear regres
jel a statistical method used in machine fang, .
mn a dependent variable and one ot morg i
variables. It predicts continuous numeric outcomes, like predicting sales ey
advertising spend, using a linear equation. The model ASSumeg a .
relationship between the independent and dependent variables a 2 Tney
minimize the difference between predicted and actual values using tegn,
ordinary least squares. qu
What do you mean by logis!
Unlike linear regression, logistic regression predicts categorical Outcomes, fp,
for binary classification tasks, determining the probability of an instance bet
toa particular class. The model estimates the probability using the | logistic foe
and is suitable for tasks like spam detection or medical diagnosis, he |
outcome is a 'yes' or ‘no’ scenario. Instead of predicting exact values, it predsg,
What do yor
Linear Regression mod
the relationship betwee
ly
tic regression model?
 
probability of a binary outcome.
4 Long Answer Questions 7
‘What is supervised learning? Give any two examples of supervised learning.
Refers section 3.1 and 3.2
Explain Classification model. Discuss the steps involved in classitication leamisy
Refers section 3.4 and 3.5
What is KNN algorithm? Explain its working by taking suitable example,
Refers section 3.6.1
What is KNN algorithm? Discuss its advantages and applications.
Refers section 3.6.1
© scanned with OKEN Scannerfe dataset ang
+ gon 68
 
eh
ya vo impl
am to implement a k-Neare, t Nos,
wr 4 ge train the classifier on the g,,,\°i8hbors Wang
a fvaluate th,
| i? oI to implement a decision tree classifi
ye? fier ye;
wi the decision tree and understand its splits °F Using SkitSeam ns
ection 37.7
to use skleam’s p,
Fag Python program eed siont,
. 7 Tee] i
i ete a skleam datasets. Inplems pattities ‘6 buita
aed ce of a split (entropy, information gain, Bini meas 0 find the
ection 3.7.7
§ hin Bayes Theorem by taking suitable example,
t agesseton 3.8
gin Naive Bayes Classifier with an example of its use in Practical life,
ys Refers section 59
o piseuss the various steps involved in the optimization framework for linear
models.
ss Refers section 3.11
[ EXERCISE
tic ‘the
{Deserve the motivation behind random forests and mention two reasons why hey
‘we better than individual decision trees?
2 Bplain what is information gain and entropy in the co
2 tat are the different methods to split a tree ina decs
\ Conteyes Belief networks solve all types of PFODEMS? ng, is
§ Whatis Naive Bayes Classifier? Discuss Its advantages 4” ctor vineat eo}
* Whats tinear model? Discuss the optimization Framewo wt!
chine learning, seqressn #08 1
: ateus the difference between logistic
ble example, |
text of decision trees?
ion tree algorithm?
eos"
© scanned with OKEN Scanner