ML 3RD Unit

Uploaded by

Keerthana Keeru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

65 views67 pages

ML 3RD Unit

Uploaded by

Keerthana Keeru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 67

apter Supervised earning 3.3 1 INTRODUCTION As the name suggests, supervised machine learning is built on monitoring the way that machines learn. This means that in the supervised learning technique, we train the machines using the labeled or trained data set, and based on that training, the machine predicts the output. In this context, the flagged data indicates that some of the inputs that we feed to the algorithm are already mapped to the output. In supervised learning, the training data that is given to the machine serves as the supervisor, instructing machine on how to correctly predict the output. It employs the same idea that a Pupil would learn under a teacher's guidance. Supervised learning is a: process that provides the machine learning model with both input data and correct output data. The aim of a supervised learning algorithm is to find a a © scanned with OKEN Scanneros ea / Seber eg mapping function to map the input variable(x) with the output variable(y), 7, t in this type of machine learning, we try to teach the machines using the traineg gy then expect it to predict the outcomes on the test data. My Supervised learning allows machines to classify things, problems, or situa, data fed into them. Machines are repeatedly fed with data such as the traits, be measurements, color, and height of objects, people, or situations until the mah conduct accurate classifications. q During supervised leaming, a machine is given data referred to as taining ay mining terminology, on which t performs classification, For example, ifa syst? 4y to classify fruit, it would be given training data such as color, shapes, dimension and ba Would be able to classify fruit based on this data. it, To accomplish accurate classification, a system often requires numerous itera such a procedure. Since real-life classifications such as credit card fraud detecign ® disease classification are complex tasks, the machines require suitable data ang i ui Tounds of learning sessions to attain realistic skills. 3.2. EXAMPLES OF SUPERVISED LEARNING : The working of Supervised learning can be easily understdod by the below example diagram: EXAMPLE 1: Let's say we have a dataset of various forms, such as Squares, rectan, triangles, and polygons. Now the first step is that we need to train the model for each shay Prediction a [).| Swe L+(*2\_. ste Tiangle Model Training Test Data Fig. 9.1 Example of Supervised Learning © scanned with OKEN Scanneryr wer es ane giver shape has four sides, and all the sides are equal, then it will be labeled as * square: gpthe given shape has three sides, then it will be labeled as a triangle. . iethe giver shape has six equal sides then it will be labeled as. hexagon. . as after training, the model will be tested using the test set, and the task of the model stoidentfy teeters : machine has already trained on many shapes, so when it encounters a new shape, it sizes it based on 8 number of its sides and predict the result. cates PLE 2: As shown in the above example, we have initially taken some data and xed them as ‘Spam’ or ‘Not Spam’. The training supervised model uses this labeled atlas model is trained using this data, Once it has been trained, we can test our model by th ae or afew test emails to see if it can accurately predict the desired result. sl Fig, 3.2 Example of Supervised Learning '3,3. TYPES OF SUPERVISED MACHINE LEARNING On the basis of the problem that we have to encounter, we can classify Supervised Learning into two categories: (a) Classification We employ Classification algorithms to handle the classification problems where the output variable is categorical, There are many other kinds of these categories, including True © scanned with OKEN Scannerxa epee or False, Male or Female, White or Black, classification algorithms forecast the categories tl Some of the widely used Classification algorithins are: A @ scanned with OKEN Scanner and others. On the basis of training gar, hat are present in the dataset, thy K-nearest neighbor (KNN) Decision Tree Algorithm Naive Bayes Classifier Algorithm Random Forest Algorithm Support Vector Machine Algorithm (b) Regression Regression algorithms are used to solve regression problems where there exists a lines relationship between input and output variables. When the output variable has an actus value, such "dollars" or "weight," the situation is known as a regression problem, These variables can be used to forecast continuous output variables, such as market trends, weather forecasts, and other things. Some of the popular Regression algorithms are: «3.4 Simple Linear Regression Algorithm ‘Multivariate Regression Algorithm Decision Tree Algorithm Lasso Regression " CLASSIFICATION MODEL es Classification is a process of categorizing data or objects into predefined classes « categories based on their features or attributes. Classification is a form of supervised learning technique in machine learning in which an algorithm is trained on a labeled datast to predict the class or category of fresh, unseen data.. The primary goal of classification is to create a model capable of accurately assigning * label or category to a new observation based on its properties. For example, a classification medel might be trained on a dataset of images labeled as either dogs or cats and then used to predict the class of new, unseen images of dogs or cats based on their features such # color, texture, and shape. When the output variable is a category, such as "red" or "blue" or "disease" and '™ disease," the problem is called a classification problem. A classification model makes aoa Learang | Page 3.5 ioe derive some conclusion from observable values. Given one or more inputs a ‘on model will try to, predict the value of one or more outcomes. For example, ing emails “spam” or “not spam", when looking at transaction data, , or “authorized”. jassification algorithm's main purpose is to determine the category of a given these algorithms are primarily used to predict the output for categorical data. mm below can help you better understand classification methods. There are asses in the diagram below: class A and class B. These classes have features that are two s toeach other and dissimilar to other classes. sii @ Class Fig.2.3 Example of Classification A classifier is the algorithm that performs classification on a dataset. There are two types of Classifications: : + Binary Classifier: If the classification problem has only two possible outcomes, then itis called as Binary Classifier. Examples: YES or NO, MALE or FEMALE, SPAM or NOTSPAM, CAT or DOG, etc. Multi-class Classifier: If a classification problem has more than two outcomes, then itis called as Multi-class Classifier. Example: Classifications ‘of types of crops, Classification of types of music. Learners in Classification Problems: Inthe classification Problems, there are two types of learners: 1 \ | atlas a Lary Leamers: Lazy Learner firstly stores the training dataset and wait witil it Teceives the test dataset. . In the case of the lazy learner, classification is performed a @ scanned with OKEN Scannerusing the most related data contained in the training dataset, ~ time, but predictions take longer. Example: K-NN algorithm, Case-based reasoning 2. Eager Leamers: Eager Learners develop a classification mode, — dataset before receiving a test dataset. In contrast to Lazy Learners tat @ scanned with OKEN Scanner devote more effort to learning and less time to prediction, Example, Eager uy Naive Bayes, ANN. inn 3.5 CLASSIFICATION LEARNING STEPS The basic principle behind classification is to train a model on a labeled “< input data coupled with their matching output labels in order to discove, ie oe relationships between the input data and output labels. Once trained, the mod, Patina to predict output labels for previously unknown data. Sean The classification process typically involves the following steps: 1. Define the problem: The first step in applying classification is to Precis, the problem and the desired outcome. What class labels are you eet predict? What is the relationship between the input data and the cla a Suppose we have to predict whether a patient has a certain disease or se = basis of 7 independent variables, called features. This Means, there cntesne Possible outcomes: - ° The patient has the disease, which means “True”. * The patient has no disease. which means “False”, This is a binary classification Problem. . Feature Extraction: io Pn n: The relevant features or attributes are extracted frome 4 Pan ee pe ‘0 differentiate between the different classes, Suppose our ‘Pendent features, having only 5 features influencing the Inbal of[EERA pearning an es remaining 2 are negligibly or not correlated, then we will use only these 5 : oa ~ sonly fr the model training, fea! 4, choose an algorithm: There are numerous algorithms and strategies for machine * eaening each with its own set of advantages and disadvantages. Algorithms that sre commonly used include logistic regression, decision trees, support vector machines GVM), and neural networks. The algorithm you choose will be etermined by the nature of the problem you are attempting to answer as well as the qualities of the data you are working with, sain the model: Following the algorithm selection, the model i trained using the raining data. This involves feeding the data into the algorithm and allowing it to team the relationships and pattems in the data. During the training process, the model's internal parameters will be adjusted to minimize the difference between the predicted and actual output. Evaluate the model: After the model has been trained, it is critical to assess its performance in order to determine how well it can solve the problem. This can be accomplished by the use of several evaluation criteria’such as accuracy, precision, or recall. If necessary, the model can be fine-tuned or updated to increase its performance. Fine-tuning the model: If the model's performance is not satisfactory, you can fine-tune it by adjusting the parameters, or trying a different model. Deploying the model: Finally, after we are satisfied with the model's performance, we may use it to generate predictions on new data. It can be used to real-world ey 2 problems. .3.6. K-NEAREST NEIGHBOR (KNN) Ee + KNearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised Learning technique. * ‘The K-NN method assumes similarity between the new case/data and existing cases and places the new case in the category that is most similar to the existing categories. * The KN algorithm maintains all existing data and classifies a new data point ‘sed on the similarity. This means that when fresh data is generated, it may be ‘wickly classified into a well-suited category using the K- NN algorithm. @ scanned with OKEN Scanner* The K-NN algorithm can be used for both regression and classification 4, eo .d for classification tasks. t we ' parametric algorithm, which means it makes no assumpyig, 7 a nth, underlying data. * It is also called a lazy learner algorithm because it does not learn from y, set immediately instead it stores the dataset and at the time of classi, trai ca @ scanned with OKEN Scanner “iy ti, & performs an action on the dataset. tion i | it + KNN algorithm at the training phase just stores the dataset and when ig data, then it classifies that data into a category that is much similar to the ngs sm fa Example: Assume we have an image ofa creature that resembles a cat or a dog " want to know whether it is a cat or a dog. So, because it works on a similarity ieee Me May utilise the KNN method for this identification. Our KNN model will find the 4." features of the new data set to the cats and dogs images and based on the most features it will put it in either cat or dog category. mle, EXAMPLE The following is an example to understand the concept of K and working of jy algorithm — ‘Suppose we have a dataset which can be plotted’as follows - Fed Col Dow Blue Color Dots Now, we need to classify new data point with black dt (at point 60,60) into blues class. We are assuming K = 3 ie. it would find three nearest data points. are EEE yearning wn nin te next diagram: . 01 40 fo 20 9 40 80 OOS ee closest neighbors of the data point are marked with a black dot in the diagram The the : spove. Because two of the three are in the Red class, the black dot will also be assigned to the Red class. 3.6.1 K-Nearest Neighbor Algorithm Pseudocode programming languages like Python and R are used to implement the KNN algorithm. the following is the pseudocode for KNN: 1. Load the data 2. Choose K value 3, For each data point in the data: «Find the Euclidean distance to all training data samples * Store the distances on an ordered list and sort it * Choose the top K entries from the sorted list © Label the test point based on the majority of classes present in the selected points 4, End 3.6.2 Few ideas on picking a value for ‘K’ * There is no organized approach for determining the best "K" value, experiment with different values, assuming that the training data is unknown. We must ae @ scanned with OKEN Scanner= EEO ii Choosing smaller numbers for K can be noisier and have a grate outcome. inpag © Higher K values result in smoother decision boundaries, Fesulting in bs % but increased bias. Additionally, computationally expensive, lower i, ‘Scanned with OKEN Scanner @ + The value of K can be chosen though cross-validation. Take the smay the training dataset and call ita validation dataset, and then use the sane ton h ; ry different possible values of K. This way we are going to predict the ap ea instance in the validation set using with K equals to 1, K equals to 2 K org, * and then we lok at what value of K gives us the best performance on gb set and then we can take that value and use that as the final setting of o Valida 50 we are minimizing the validation error . Mt gor + In general, practice, choosing the value of kisk = sqrt(N) where Ng the number of samples in your training dataset. ands jg, * Try and keep the value of k odd in order to avoid confusion between two data. S065 3.6.3 Distance Metrics Used in KNN Algorithm AAs we know, the KNN algorithm can assist us find the closest points or groups query point. However, we need some metrics to find the closest groups or points to a Fs point. We use the distance metrics listed below for this purpose: * Euclidean Distance © Manhattan Distance ¢ Minkowski Distance 1. Euclidean Distance This is simply the cartesian distance between two locations in the plane/hyperplane. Euclidean distance can also be represented as the length of the straight line connecting th two points under examination. This metric allows us to compute the net displacemert between two states of an object.ing Cae wn Euclidean © ‘© p.shanhattam Distance 3 sis distance measure is‘ commonly utilized when the total distance travelled by the is more important than the displacement, This metric is computed by adding the ee differences between the coordinates of points inn dimensions, abso! * ax,y)= B \xi—yil fe Manhattan O We can say that the Euclidean, as well as the Manhattan distance, are special cases of the Minkowski distance. From the formula above we can say that when p=2 then it is the same as the formula for oy ldean distance and when p = 1 then we obtain the formula for the Manhattan istance, > @ scanned with OKEN ScannerMinkowek! f a @ scanned with OKEN Scanner oN 3.6.4 Example on KNN Algorithm The table below represents our data set. We have two columns» and Saturation. Each row in the table has a class of either Red or Blue. Before 5°"! new data entry, let's assume the value of K is 5. We ittrodya 1 2 40 3 50 4 0 5 10 — [Tred 17 [BU siusmo tos ees aes 7 60 10 Red_| | 8 25) set BO “Blue _| rs i Here's the new data We have a new entry but it doesn't have a class yet. To know its class, we have! calculate the distance from the new entry to other entries in the data set using the Euclides* distance formula.yeorntnd ETERER en oemala: WXEXI + FY) e 's brightness (20). x2 2 New entry’ xe Existing entry's brightness. Y2 =New entry's saturation (35). ys = Existing entry's saturation. ets 40 the calculation together. I'll calculate the first three. tance #2 ase es" ol BRIGHTNESS SATURATION CLASS. dl =¥(20- 40)" + (35 - 20)° = ¥400 +225 = 1625 =25 Wenow know the distance from the new data entry to the first entry in the table. Let's update the table. @ scanned with OKEN ScannerEX Distance #2 For the second row, d2: 50 [25250 jets d2 =1(20 - 50)* + (85 - 50)" = 900 +225 =V1125 = 33.54 Here's the table with the updated distance: BRIGHTNESS SATURATION CLASS DISTANCE At this point, you should understand how the calculation works. Attempt the distance for the last five rows. Here's what th shave been calculated: to calculate e table will look like after all the distances scr sui ass Saturation @ scanned with OKEN Scannerchose 5as the value of K, we'll only consider the first five rows. That is: we since ee above, the majority class within the 5 nearest neighbors to the new entry syou can see above, the gel Terfore, well classify thesnew entry as Red, isked. ser’ the updated table: 3.6.5 Advantages of K-NN Algorithm Here are some of the advantages of using the k-nearest neighbors algorithm: * Its easy to understand and simple to implement. Itcan be used for both classification and regression problems. Because there are no assumptions about the underlying data, it is appropriate for Non-linear data. naturally capable of handling multi-class instances. 'tcan perform well with enough representative data. @ scanned with OKEN ScannerIN Algorithm 8y pos tages FN ‘using the k-nearest no} ~ ava ptages of USB Neighbor, disadva” Ss he 7 : a “sig high as it stores all the train Ig6, cost is hig! ining da Bor; « Sensiti cations of eed 3.6.7 Appl sno jearning, classification is an importa science and ma d mos! ve one of the oldest and MOS” is one ome applicati modeling. Here are st a ee : Credit rating: The KNN algorithm assists in determining on i ig them to others who share similar traits, rating by comparin : Loan approval: ‘The k-nearest neighbour method, like credit Satis detecting individuals who are more likely to default on loans by — ig qualities with similar persons. mPa 1g: Datasets can have many missing values. The which estimates missing values, NN Mey, + accurate algorithms for pattern reco ale ions for the k-nearest neighbor alg otithn ang My | ek ‘% % eld A Indata © Data preprocessin, used for missing data imputation, Pattern recognition: The KNN algorithm's capacity to recognize Pate wide range of applications. It can, for example, recognize pattems MTS ley, usage and identify anomalous patterns. Pattern recognition can ae Cet pattems in client purchasing behavior. Used ing * Stock price prediction: Because the KNN algorithm is good at estimatin, theng of unknown entities, it can be used to forecast the future value of Aad a historical data. “ + Recommendation systems: KNN can be used in recommendation systems =| can help locate people with comparable traits. It can, for example, be used al online video streaming platform to recommend content that a user is more le}, view based on what similar users watch. | © Computer vision: ‘ 3 i S, 2 = — ae KNN algorithm is used for image classification. Si’ pay ae ‘ouping similar data points, for example, grouping cats togete’ liffer i s : ent class, it’s useful in several computer vision applications. @ scanned with OKEN ScannerOr tical implementation of KNN Algorithm mig see Aes “Learn prac! iris dataset, which is one Of the Widely Used latasets 36 vg use i dataset is included in R base and Pyt Oe ei ct so that users can access it without hay; iho ear, io eit a can be imported from Sklearn dat, i eas ick w is a we can use for practicing, 1s! for learning ML the Machine ing to find a Source for it Asets, learning which Contains Numerous ae sklearn.datasets import load_iris :| fee = load iris() iri the data and target value into two separate Variables, tore wes + pandas as pd i 3 | amr DataFrame(iris.data, columss oris feature nanes) ae er putabraoa(ieda targets vere print (x) output as: weget the utp! cise PRET Teng wa ‘Sepe 5.1 : 6 . 1 K 2 . 3 H “ at 4 8 5.2 23 Es 6.7 ae 6 Ls us 6.3 3.0 5.2 2.0 145 6.5 3.4 5.4 2.3 uy 6.2 te 5.4 1.8 148 ats 3. 49 [350 rows x 4 columns] e ee 1 @ 20 3 0 48 us 2 M6 2 17 2 M8 2 14g 2 a © scanned with OKEN ScannerSupervise L Peze 3.18 | ‘nny .d to split the iris dataset into t.: In order to train and test the model, we nee datasets, split : [from sklearn.nodel_selection inport train_test_s i i i xctrain,x.test,y train y_test = train test_split(nystest.s To check the shape of training and testing data, write the code below: [8]: | print(x_train. shape) print{x_test.shape)] print(y_train. shape) print(y_test. shape) We get the output as: (128, 4) (30, 4) (128, 1) (38, 1) Next, we have to build the model. Here is the code: [16]: | from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors = 3) knn.fit(x_train,y_ train) In our example we are creating an instance (‘kn’) of the das ‘KNeighborsClassifer’, in other words we have created an object called ‘knn’ whid knows how to do KNN classification once we provide the data. The Paramete, ‘n_neighbors’ is the tuning parameter/hyper parameter (k). Hyperparameters in machine learning are preset settings or configurations that gover the learning process of a model. Unlike parameters, they're not learned from data bi selected beforehand, influencing the model's behavior, performance, and learning rt Hyperparameters are chosen through techniques like grid search, random search, & © scanned with OKEN Scanneryearning re pimization. We have set « _ R_Neighbora: to ing the optimal value of K is critica), sist set. To do this, we will use the “, e ful see how well our model pre, Now let’s Se Core’ functi : io ; dictions m, n and pass j curate our model ig a *ch Up to the actual N Our test input and results, th of to staat onto model has an accuracy of approximat ely 1009 jgnbore’ 1031s correct. ¥ 100%. It means that the setting ‘nse! 3 model is trained, we can use the preai, : nee the a Ct () function on fi t data. The " ‘ our model to mak: gions on our test Predict’ method is used to test the model on testing sas w.test) (23): y_pred = knn.predicti(x test] ‘A dassification report is a performance evaluation metric in machine learning. It is used tp show the precision, recall, F1 Score, and support of your trained classification model, A Classification report is used to measure the quality of Predictions from a classification algorithm. How many predictions are True and how many are False. More specifically, True Positives, False Positives, True negatives and False Negatives are used to Predict the metrics ofa classification report as shown below. Precision is the ability of a classifier not to label an instance positive that is actually negative. Recall is the ability of a classifier to find all positive instances. For each class it is defined as the ratio of true positives to the sum of true positives and false negatives. The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and the worst is 0.0. :| from sklearn.metrics import classification_report print{(classification_report(y_test,y_pred))) © scanned with OKEN ScannerEE Sag We get the output as: ‘y, precision recall f1-score e 1.06 1.00 1.00 1. 1 1.00 1.00 1.00 9 2 1.00 1.00 1.00 4 1.00 3e accuracy macro avg 1.00 1.00 1.06 5 weighted avg 1.08 1.00 1.00 = Now we will create the Confusion Matrix for our K-NN model to see the a, classifier. Plot the confusion matrix of the true test labels y test ang the “Yofg, labels y_pred. Below is the code for it: Pre, from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y pred) print{cm)] [29]: In above code, we have imported the confusion_matrix function and called; variable cm. itusing [[1e @ @] [fe 9 @] [@ 6 11}] In the above image, we can see there are 1019111 30 correct Predictions and 0 redicti ill visualis i i ua Be a none. Now, we will visualize the confusion matrix for K-NN model. Write the cae pP [33]: | %matplotlib inline inport matplotlib.pyplot as plt import seaborn as sn plt, Figure(figsize=( 7,5)) sn.heatmap(cm, annot=True) plt.xlabel/( "Predicted! ) plt.ylabel( ‘Truth’ ) © scanned with OKEN Scannerfou 7 output as: ge © We 29222222, 0.5, Truth’) Predicted [Ei THE DECISION TREE MODEL OF LEARNING edsion Tree is a supervised learning technique that may be used to solve ctsscation and regression problems; however itis most commonly employed to solve classification problems. Itis a tree-structured classifier in which internal nodes contain dataset attributes, branches represent decision rules, and each leaf node represents the result. ‘A Decision tree has two nodes: the Decision Node and the Leaf Node. Decision nodes are used to make decisions and have numerous branches, whereas Leaf nodes represent the results of those decisions and do not have any additional branches. ‘The decisions or the test are performed on the basis of features of the given dataset. \tis a graphical representation of all possible solutions to a problem/decision given certain conditions, © scanned with OKEN ScannerEa SuPer, 1. ¢, like a tree, it begins with ed * Itis called a decision tree because, the TOOL no, mn, branches out to form a tree-like structure. leorth ‘ Stang In order to build a tree, we use the CART algorithm, which stangs . he, and Regression Tree algorithm, “Sit * A decision tree simply asks a question, and based on the answer (Veg Hs, 0), Split the tree into sub-trees. . » it fay, Below diagram explains the general structure of a decision tree: EYE Wa Root Node ‘Decision Node 1 Sub-Tree |Decision Node Leat Noas Fig. 3.4 Structure of Decision Tree Example: The below binary tree can be used to explain an example of a decision Assume you want to forecast whether a person is fit based on variables Such as age, ei habits, physical activity, and so on. Questions like 'What is his age’, Does he exercise? ‘Does he consume a lot of pizzas?’ are decision nodes here. The Teaves, which are eth. or ‘unfit’ are outcomes. In this case this was a binary classification problem (@ yes notype problem). Yes 2. No? Eatsalot Exercises in -Ofpizzas? the morning mt \ sore /\ 0s Unit Fit Fit Unfit © scanned with OKEN Scannerning 7 Decision Trees? ust re Dr yor 5 are a popular machine learn, aon tree 8 algorithm, of ‘on tasks. Here are some reasons why decision t jon ior d easy to interpret: Decision r an ive an uit applications because they give relationships and generate pred, " : ally represented, making them ip! for both lassificat Aeoften used, “ton Tees are an oy Cellent choi 2 @ straightforward O'Ce for decision. 5 @Pproach 4 ctions based on current f 0 describe ; ‘acts. They can Tf easi = cn ter to grasp and Convey to, others, be tile: Decision trees are useful in machine learnin, versa : 8 because th th classification and regression tasks. Thy for bot °Y can be used to si including those in healthcare, finance, and marketing, issues er ey, : roach, which ‘Means they ‘i "= Gees them applicable to a wide make of data kinds and distributions. In contrast, Parameic appanage gna specific data distribution, maki Y may be used olve a variety of t: Decision trees are robust to outliers and noi st: ; - ee Gaia Wine a neat imputation. This 4 5 ene predictions even when data is not perfectly accuré ise in the data, and can handle S means they can still produce clean. ble: To handle enormous datasets and difficult challen, a rae up. There are various decision tree Variants, s1 as ue that can improve their performance and scalability. boosting, 'Bes, decision trees can be uch as random forests and | 4.7.2 Decision Tree Terminologies it Node: The decision tree begins at the root node. It Tepresents the full dataset, Root Node: ‘hich is partitioned further into two or more homogeneous sets, wi if Node: Leaf nodes are the tree's final output node, and the tree cannot be further Leaf Node: separated after obtaining a leaf node. ; ry fe Splitting: Splitting is the process of separating the decision node/root node into su nodes based on the conditions specified. Branch/Sub Tree: A tree formed by’ splitting the tree. from the tree. Pruning: Pruning is the process of removing the unwanted branches fro de, and other ParenlChild node: The root node of the tree is called the parent node, nodes are called the child nodes. © scanned with OKEN ScannerDy | Supervieeg a f "4 aa ision Tree Algorithm : node and works its way up ¢, 3.7.3 Working of Dec Ina decision tree, the set ace te values of the root attribute wine vehses thos eo aa aa attribute and then follows the branch and jumps bb i next node based on the comparison. The algorithm compares the attribute vé the next node. It repeats the process until algorithm can help you better ‘understand the entire process: 5, which contains * Step-1: Begin the tree with the root node, says the Compigy lue with the other sub-rodes and move, it reaches the tree's leaf node. The fos . ope “me the best attribute in the dataset using Attribute Selection Measure (As, © Step-3: Divide the $ into subsets that contains possible values for the best attebuteg © Step-4: Generate the decision tree node, which contains the best attribute, © Step-5: Recursively make new decision trees using the subsets of the dataset Creag in step-3. Continue this process until a stage is reached where you cannot fury, classify the nodes and called the final node as a leaf node. 3.7.4 Attribute Selection Measures The biggest challenge that arises while developing a Decision tree is how to select thy best attribute for the root node and sub-nodes. To tackle such challenges, a technique known as Attribute Selection Measure, or ASM, is used. We can easily determine the bat characteristic for the tree's nodes using this measurement. The popular technique for ASM, is Information gain: ¢ Information gain is the measurement of changes in entropy after the segmentation a dataset based on an attribute. ¢ It calculates how much information a feature provides us about a class. * According to the value of information gain, we split the node and build the decison tree. A decision tree algorithm will always try to maximize the value of information gai and the node/attribute with the most information gain will be split first. Information gain is denoted by IGS, A) for a set $ is the effective change in ent! ifter deciding on a particular attribute A, It measures the relative change in entropy wit 2spect to the independent variables, © scanned with OKEN Scanner. calculated using the below formuta: 100 G(s, A) =HS)- HS, A) ny ae natively’ x IG, A) = H(S) ~ & P(x)* A(x) ie ility of event x. ott . the probability 0} gutropy also called Svanmen Entropy is denoted by HG) for a fins os of the amount of uncertainty or randomness in dats a finite set Sis the 3% 1 HG) = P(x)logy pe) ittells us about the predictability of a specific event 3 ds and a 0.5 probability of tai ; pility of hea PI ty of tails. Because there is : Me wil happen, the entropy is as high asi canbe. Cons no means of knowing - Consider a coin with hi ves the entropy of such an event can be fully anticipated because we ine oe e gatit wil always be heads. In other words, because this event has no unpredcabity ve entropy is 0. Lower numbers indicate less uncertainty, whereas larger values eae greater uncertainty. 3.7.5 Pruning in Decision Tree Pruning isa strategy for reducing the complexity of decision trees by deleting branches that are unlikely to increase the tree's accuracy on unseen data. Pruning can assist prevent overfitting, which occurs when the tree gets too complicated and closely matches the training data, resulting in poor generalization to new data. Consider a coin toss with a 05 There are two main types of pruning: * Pre-pruning: This involves setting a limit on the depth of the tree, the minimum number of instances required in a leaf, or the minimum information gain required for a split. Pre-pruning is often used when the dataset is large or noisy, and when building a large tree would be computationally expensive or lead to overfitting. Post-pruning: This involves building the full decision tree and a removing, branches that do not improve the accuracy of the tree on a validation set. ses Pruning is often used when the dataset is small or clean, and when building 2515 tee is not computationally expensive. © scanned with OKEN Scanner“om gree ider a 14-day period of cist this. Cone a 0 0 ty i ea i ete, Huu create 2 prediction model that ist ‘ill De played that day. To accom 4 seat at dO oe whether 8° tree to do that Mey BRE = | | | | We will perform following tasks recursively: 1. Create oot node fr the tree- 4 tall examples are postive, retum leaf node ‘positive’. 4. Els ifall examples are negative, return leaf node ‘negative’. 4. Calculate the entropy of current state H(S). 5. For each attribute calculate the entropy with respect to the attribute ‘x’ denoted ly HS,x). 6 Select the attribute which has maximum value of IG(S, x). F Pt oF attribute that offers highest IG from the set of attributes ee es Se ie ete eee let's go and gow th decision tre, The inital step i to calculate 8. Enttopy ofthe curen Ye, "state. Inthe above example, we can see in total there are 5 Nos 4 © scanned with OKEN Scannert parr” Ye f” se ne Esa my : . logy —— gpiop) * 2 POMIOB2 TS Gon(d) Sal nto) “~ (348? (ag 3) og (3) =0.940 thatthe Entropy is 0 if all members belon Pr uel to one class and other half belong to other class indicat 0 OO tvs 0.9 indicating that the distribution ¢ * indicating complete é , Here : : 18 Teasonably rando on choose the attribute that gives us highest possible Informe NOW: the ae? : ae root node. Let’s start with ‘Wind’, formation Gain which ef 8 to the same class, and 1 When half 10s. wind) = H(S) oe P(x)xH(x) ig are the possible values for an attribute. Here, attribute Wind’ takes tw seuss the sample data, hence x= (Weak, Strong). We'll have to calculate: 1, Hees) 1, He) 3 FG) 4, PGane) 5. H§)=0%4 which we had already calculated in the previous example Anonsall the 14 examples we have 8 places where the wind is weak and 6 where the wwadisStrong. Wind = Weak | Wind = Strong. Total 8 6 4 ~ Number of Weak Total P(Sweas) 28 14 Number of Strong PSereng) = Total Ba © scanned with OKEN Scannerty *¥es' for Play Golf and 2 o¢ then cee “iw Weak examples, we have, 2 6 (Z)io 3) Entropy(Sees) * -{§}ose (3 Eke =0811 we have 3 examples where the outcome ag e amples, Strong No’ for Pay Cole 6 of them were . " Now, out of the 8 *No' for ‘Play Golf’. So, Similarly, out of 6 z for Play Golf and 3 where we had No! moa --po( m6) =1.000 ile other half belongs to other items bel to one dass while : ; : Remember, here half items ens ‘we have all the pieces required to calculate i: we have perfect randomness. No Information Gain, * ind) = HUS) - 3: P(2)* H(@) IG(S, Wind =H(8)- %, PO) = H(6) —P(Seas) * H(Sons) — P(Surong) * H(Ssrmgt) : 3 é] 1.00) -os10- [Jost (é (1.00) = 0.048 IG (5, Wind) It tells us the Information Gain by considering ‘Wind’ as the feature and give ys information gain of .048. Now we must similarly calculate the Information Gain forall te features. IG(S, Outlook) = 0.246 IG(S, Temperature) = 0.029 IG(S, Humidity) = 0.151 IG(S, Wind) = 0.048 (Previous example) We can clearly see that IG(S, Outlook) has the highest information gain of 0.246, hence we chose Outlook attribute as the root node. At this point, the decision tee looks like. © scanned with OKEN Scannerje that whenever the outlook 2 Overcast, Play Golf is always ‘Yes’, its no re ob se ae the ue tree resulted because of the highest information gain oy bute Outlocs eae Outlook, we've got three of them remaining Humidity Opal ee ‘And, we had three possible values of Outlook: Sunny, Overcast, wt orovercast node already ended up having leaf node ‘Yes’, so we're left with Oh pee ne te; Sunny and Rain. | 310 i sabe computing H(Seusrs). / — tlook is Sunny looks like: ot ne value of OW t6suny)=(2)!982 (2)-(3) loge (3)- 096 thesimilar fashion, we compute the following values hi [0S.ny Humidity) = 0.96 {GlSsey Temperature) = 16Gexy Wind) = 0.019 As we can see the highest Information Gain is given by Humidity. Proceeding in the sme way with Sua will give us Wind as the one with highest information gain. The final Iiion Tree looks something like this. 1.57, © scanned with OKEN Scanner3. whether golf will be played that day. d the dataset with pandas: First, open the Jupyter notebook and rea [2]: | import pandas stro Weak No al Implementation of Decision Tree using Scikit is 7 Practic: in which the features are Outloox. d of data in wl Consider a 14-day periov fs the outcome variable is whether golf was played that a Humidity, Windy, eh model that takes the above four characteristicy and ge now is to create a pl df = pandas.read_csv("d:/golf-dataset.dsv") print(df) Yes Tr rhea, | The above code read golf-dataset.csv file stored in D: drive and print the data set, Outlook Temp Humidity je Rainy Hot High 7 Rainy Hot High 2 Overcast Hot High 3 Sunny Mild High 4 Sunny Cool Normal 5 Sunny Cool Normal 6 7 8 Ig Overcast Cool Normal Rainy Mild High Rainy Cool Normal Sunny Hild — Nomad @ Rainy Mild Normal 21 Overcast midg High Overcast Hot Nonath Sunny mig High = E 8 o Windy Play Golf Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong No © scanned with OKEN Scannerno yo yr ce ithe table: tar Wm, mn is the Days where 0 is the de® di yu yal cA | isda fre irs column is Outlook. We have three typ a Sl 90 on, s, es "pe He esasl and Sunny. Of Values in this feat, te i, ‘i gotun Temp: We have thre : rd © types of y; in this feature , OF Values in this re i.e, Hi fot, Mild a umn is Humidity. We have two rth col ‘yPes of values in this feature ie. High at column is Play Golf. We have two values ie ‘Yes or No, nels 4 ecsion tee all data has to be numerical, a fo merical coh 4 ie convert the non nut columns ‘outlook’, Temp’, "Humi ity’, Windy t we id into numerical values. ny pps a0 method that takes a dictionary with information on how to convert convert the values ‘Rainy’ to 0, ‘Overcast’ to 1, and ‘Sunny’ to 2. seas wis run the above concept: d= {‘Rainy’: @, ‘Overcast’: 1, ‘Sunny': 2} df['Outlook'] = df['Outlook" ].map{d) print (df) Mee the output as follows where the content of the Outlook column is replaced with “ped numerical values: © scanned with OKEN ScannerHumidity Windy pia K Temp igh = Weak Y S01¢ Hot High Strong : . ee high Weak ye 2 2 Mild High Weak yes 3 ; cool Normal Weak ven 4 2 cool Normal Strong Rs 7 cool Normal Strong a8 & 3 mild = High Weak he if @ cool Normal Weak y 0 : 2 Mild Normal — Weak va e @ Mild Normal Strong Yes 10 1 Mild High Strong Wie 11 1 Hot Normal Weak Yes es 2 Mild High Strong No wwe do the same thing for other columns also. We run the following cog le; No! [31]: | d = {'Hot': @, ‘Mild’: 1, ‘Cool': 2} df['Temp'] = df['Temp'].map(d) "Normal': 13 d = f'High': @, df['Humidity'] = df['Humidity'].map(q) d = {'Strong': @, 'Weak': 1} df[‘Windy'] = df[ 'Windy'].map(d) d= {'Yes': 1, 'No': @} df[ ‘Play Golf'] = df['Play Golf'].map(d) print (df) © scanned with OKEN Scanneryon’ it aS foll tp llows. Now all the Nonny, e Numer s lcal 4 on lve Values hag rE fl the Tem| > a Va ter : Humidity Windy "paced f e 4 Play Golf @ e e ° 8 e 4 1 a 1 1 8 1 2 as : ° 5 2 2 : : A 2 2 : ‘ 8 5 a 2 : ° : § e 1 . | 1 7 @ 2 : : 1 8 a. 4 : : : : @ 1 i Fi 10 1 e ih 1 8 8 : p 1 8 ; : 2 8 Bye t @ 5 , sen we have to separate the. - feature columns from the target column. rate columns are the columns that we try to predict from, and the target column be column with the values we try to predict. i sr}:|features = ["Outlook', ‘Temp’, ‘Hunddity', ‘wihdy*] X = df[features] y = df['Play Golf'] print(xX) print(y) © scanned with OKEN Scannerciuvduvucceced SPRoPHKHOerRHESOSS SPESHHHSSHHEHON veeos NER ONeSeHENN ° wnunre Beara wenon 3 BReeVaue Sse BS 8 SRR EHH oH OHHH OS B Now we can create the actual decisi ion tree, fit it with our details, Start Dy importing! modules we need: [33]: | from skleam import thee from sklearn.tree Anport natplotiib., Amport DecisionTreeClassifier ‘Pyplot as pit \ dtree = Decisiontree dtree = tree. Fit(y, Classifien() y) tr Se Plot tree(dtree, festure nanescFeatures) © scanned with OKEN ScannerHumidity <= 0.6 gini = 0.459 samplos = 14 valuo = [5, 9} Windy <= 0.5 gini = 0.245 ples = 1 value = [1, 0} | ‘Humidity <=0.5, “gini= 0.459 it samples = 14 value =[5, 9] ch ee cen.simeans that the days with humidity of 05 or lower will follow terearrow (to the left), and the rest will follow the False arrow (to the right). gus « 0.488: refers to the quality of the split, and is always a number between 0.0 sli wiew 00 would mean all of the samples got the same result, and 0.5 would mean | faesptis done exactly in the middle. aples = 14: means that there are 14 days at this point in the decision, which is all of Sorts the first step. ~ J © scanned with OKEN Scanner> | . that of these 14 days, 5 will get a "NO", and 9 iy te value = (5, 9]: mean: me to play golf. Gin ee thod i . There are many ways to split the samples; we use the Gini method in this ty, torial, The Gini method uses this formula: Gini = 1 - (x/n)? - (y/n) iti YES"), m is the number of Where x is the number of positive answers ("YES"), sample, a the number of negative answers ("NO"), which gives us this calculation; Sy [2 - (9 / 13)? - (5 / 13)? = 0.459 Ee Humidity < = 0.5 gini = 0.459 samples = 14 value = [5, 9] The next step contains two boxes, one box for the days with "Humidity’ of 05 oy lov, and one box with the rest. True Block with samples= 7: Days Continues Outlook <= 0.5: means that the days with a outlook value of less than 05 wa follow the arrow to the left (which means days with rainy outlook with high and the rest will follow the arrow to the right. humidity), * gini = 0.49: means that about 50% of the samples would go in one direction. * samples = 7: means that there are 7 days left in this branch (7 days witha Humidity less than 0.5). * value = [4, 3]: means that "YES" to play golf. (If humidity is less than or humidity) se Block with sample= 7; Days Continues ‘ i y be “éady —= © scanned with OKEN Scanner‘Vite ur of which are traits of employees ; cctumns, fou ; sre dataset contains four co salt * 100K’ column is our targot ya, © Gop + description and degree: dataset so that abl * name, ob des sion assifier with ith this datas the mode - 1 Now t's ty eS eo woud be greater than 1 lakh or not depending” Py whether the salary & f employee. “ny ary © nd degree of emP| Pe, , job description an % ey pent shore! notebook and read the dataset with pandas: First, 0 Degree Salary_GT_10ek Yes n 5 mE sales tanger Master 1 Google HR Manager — Master No 2 Facebook Project Hanger Bachelor Yes 3 Microsoft HR Manager ‘Master No 2 steogle Project Maneger Bachelor Yes + Google Project Manager Bachelor No Facebook Sales Manager Master No 3 Facebook Project Manager Bachelor Yes & Google Sales Manager’ Master Yes 9 Facebook Sales Manager Master No 10 Microsoft Project Manager Bachelor Yes 11 Facebook HR Manager Master No 12. Microsoft HR Manager Master No Yes Sales Manager Bachelor 13. Microsoft ‘As we know that the machine is unable to understand text features. So, we form a dat, encoder to convert text-based attributes to numbers. Label Encoding is a technique that 5 . re used to convert categorical columns into numerical ones so that they can be fitted ty machine learning models which only take numerical data. [2]: | from sklearn inport preprocessing Jabel_encoder = preprocessing.LabelEncoder() df [‘Conpany'J= label_encoder. fit_transforfa(df[' Company' ae '20b']e ‘Label_encoder.fit_transform(df[‘Job"]) ne incomes J label_encoder. Fit_transfora(df{ ‘Degree’ }) U'Salary_GT_1@0K"]= label_encoder.fit_transform(df[ 'Salary_GT_100K']) print (df) a eee eee ee | © scanned with OKEN Scanneras will change the data as shown, ch is an integer value. We got i 5 he output een replaced with numericat re a a below 8 Salary_GT_100K ye columns are fr io the values a i a [‘Company' , Job", "Degree’] are oF ee getfeatures) x cr_100k ‘fl ssalary_6T ye ofl print (¥) By ba Boy fee: say Slay 6T_100K, dtype: int32 ‘ype of follows, Nowe bi the feature columns from the target column, the columns that we try to predict from, we try to predict. and the target column eee © scanned with OKEN ScannerESE We ereate a decision tree classifier and fit it against the training datas , i criterion parameter is set to Gini. OY deg, from sklearn.tree import DecisionTreeclassificy dtree = DecisionTreeClassifier() dtree = dtree.fittx, yl] Now we can create the actual decision tree, fit it with our details. Start = modules we need: if Pt, [5]: | from sklearn import tree tree.plot_treedtree, feature_names=featuresp] We get the output as follows: po es0s eicos sarploa = 14 yale [7,71 ‘samples value = [4, 0] 4 Tpoasis ‘gin =0.5' sales = 4 value = [2, 2) v ‘ginl= 0.0, sammlos=2 value = [0, 2] 3.8 BAYES THEOREM : 2 Bayes theorem helps to determine the probability of an event with random knowledg It is used to calculate the probability of occurring one event while other one ali Occurred. It is a best method to relate the condition probability and marginal probabil. © scanned with OKEN Scannerfinancial, Bayes theorem is also extensively ef °Y industry, aeronautical sector, etc, Bayes theorem is also known as the Bayes Rule probability ofA divided by the probability of event Bie ~ PBIA)P(A) P(AIB) = 7) P®) where, (A) and P(B) are the probabilities of events A and B. P(AIB) is the probability of event A when event B happens, P(BIA) is the probability of event B when A happens. | ‘The different terms associated with the Bayes theorem are as follows: | + Conditional Probability: When the happening of an event A depends on the | occurrence of another event B, itis known as conditional probability. Posterior Probability: The conditional probability of an event happening based on new information or prior probability is known as posterior probability. Prior Probability: It is the probability of an event's occurrence based on previous information. Joint Probability: The chances of two or more events taking place simultaneously is their joint probability. * Random Variables: The continuous range of values denoting the outcome of random experiments are the random variables. © scanned with OKEN Scannervariable an‘ mula of conditional probability given belo, hy Ws ed from the Likelihood Class Prior 7 PIBIA)PL P(AIB)= s a) a) pane) _ Ae Pal ®)= — pe) | \ Predictor Prior ‘This equation js deriv Joint Probablity Posterior Probability in, Now, assume that event A is the Teg Pong equation 82 So according to the equation, tribute. r probability of the response variable, evidence or the probability of training data ty is the conditional probability ofthe re sPOnse var, lable Consider the previous .d event Bis the input at! # P(A) or Class Prior is the prior P(B) or Predictor Prior is the Probabili ue given the input attributes e of the posterior és Probabilit IY OF thy + P(AIB) or Posterior being of a particular va + P(BIA) or Likelihood is basically the never likelihood of training data Bayes’ Theorem Example Let us look at how the Bayes theorem probability calculator works. Assume th; le that there are two investment options, A and B. Then, the probability of i 1 . y generatin; iti from A is 74% and the probability of generating positive returns from ae a possibility of investment B providing a positive return, when investm As te positive retum is 13%. ent A also provides Based on the given data, determin K ; the probability of investm ‘di return when investment B also provides a positive return. srg tene Pie SOLUTIO! P(A) =0.74 P(B) =0.45 P@IA) =0.13 P (ai) = PBIAIP(A) PB) P(AIB) =[(0.13 « 0.74) / 045) =0.21 © scanned with OKEN Scannersxprnt earg yes Theorem cam be written as; TERT sa : Prsterior = Likelitood* Prior) Eo nee tly i, the result P(A 1B) j firstly, in genera ) is rete, : a toas the prior probability, Tre t08 the posterior probability and P(A i P(AIBk: Posterior probability, refer 4 P(A): Prior probability. imes P(BIA) is referred to likeli Sometimes as the likelihood and P(B) i PIBIA): Likelihood. (B) is referted to as the evidence. P(B): Evidence. . ‘This allows Bayes Theorem to be restated as: posterior = Likelihood * Prior / Evidence EXAMPLE: Let's take a simple example ~ Sa ly the likelihood of aera ifbey are over 65 years of age is 49%, O84 Person having diabetes Now, let's assume the following: + Class Prior: The probability of a person stepping in the clinic being >65-year-old is 20% + Predictor Prior: The probability of a person stepping into the clinic having diabetes is 35% : What is the probability that a person is >65 years given that he has? This is Let's calculate this with the help of Bayes’ theorem! Likelihood Class Prior P(patientis > 65 | Diabetes) = —“2%20_ _ pg, Posterior Probability Predictor Prior EXAMPLE : We can solve one example to get more understanding, Lets us consider that an Indian soccer team acquired a new player for a new season. The team played 20 games With the results: Total Games Wins Losses 20 15 2 © scanned with OKEN Scannerrl ‘The new player scored in the 8 ames both when his team won or lost With sy, the fo, cases: 7 Won Games Lost Game 10 7 1 of games in which the new player scored Question: Find out the probability of such event that the team wins given that the — scores. Play, MSOLUTIO! Let’s create table from above details: Player scored Not scored Tour Won 10 5 e 2 3 ; Lost Total 12 8 2» Weneed to calculate below given probability before getting ito conditional prabayy P(Won) ty. P(Total matches played) 5/20 75, P(Team Wins) = Post EcTeamn Jos) PTO ne played) = 5/20 =0.25 P(Ptayer cto ‘Team Wan) (os cenyeneoores) P(Total matches won) = 10/15 = 0.667 P(Player doesn't score scores) P(Player score! Team losses) = o toal manos ay =2/5 =04 at © scanned with OKEN Scannerearning on must put values in given formula where MEIER w ea now mn wins! Player scores) = Wins and player scored, prem P(Player|Team wi s|Team Wins)+P(Team wins) i+ ins)+P(Team wins) }+P Player scoresi Team Tos ‘0, ins | Player scores) = —_{275) (0.667) wins | Play’ (0.75)(0.66) +(0.25)(0.4) isn wins |Player scores) = 0.833402748854644 probability of Team won, and player scored in match is = 0.8334 aan probity 83.34%. ses)+p(Team lost) waren rence 49 BAYES THEOREM AND CONCEPT LEARNING ae shi section, we will discuss how the concept of Bayes theorem can be used in various oiher learning algorithms. 9.1 Naive Bayes Classifier Algorithm 3.9. 4 Naive Bayes algorithm is a supervised leaning algorithm, which is based on Bayes sheorem and used for solving classification problems. It is mainly used in fext classification that includes a high-dimensional training dataset. 4 Naive Bayes Classifier is one of the simple and most effective Classification algorithms which help in building the fast machine learning models that can make quick predictions. Itis a probabilistic classifier, which means it predicts on the basis of the probability ofan object. + Some popular examples of Naive Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles. The Naive Bayes algorithm is comprised of two words Naive and Bayes, Which can be described as: S * Naive: tis called Naive because it assumes that the occurrence of a certain feature is independent of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized © scanned with OKEN ScannerTh berg ly contributes to identi fy th at an apple. aa depending ome + Bayest Its called Baye® be = of Naive Bayes Class! 3.9.2 Working orking of Naive Bayes through an example. Tag = poring ot oe calculate the Probatiy hy if to classify whether players will play or not, based o, rs We cause it depends on the principle of Bayes "i or weather conditions sports. Now, you nee condition. S ive Bayes classifier calculates the probability of an event in the folowing . Pa S Catcae the prior probability for given class labels. teps. © Step 1: bility with each attribute for each class «Step 2: Find Likelihood probal Step 3: Put these value in Bayes Formula and calculate posterior Probably «Step 4: See which class has a higher probability, given the input jy, higher probability class. TB lg For simplifying prior and posterior probability calculation, you can use the frequency and likelihood tables. Both of these tables will help you to calculate no , posterior probability. The Frequency table contains the occurrence of labels for aoe There are two likelihood tables. Likelihood Table 1 is showing prior probabil fea and Likelihood Table 2 is showing the posterior probability. 5 of ky [Weber [tay] Frequency Table ae Whether [No = = ‘Sunny ___|No ae a ‘Sunny {No [Overcast [Yes ===> Sunny |2 = any [roe] hans Rainy. [Yes Total e No. forcast Likelihood Table 1 = k : Ukelinood Table 2 [Sunny Siney, Baer ll [ay — ee Whe tl cher = 4a Se Yes See + a ea] a) mY Rain = Be ce 3 Toa S_f2" [e514 [ose | [Overcast] [a [oso [use peveaet [Yes Is lo g ss Overcast |¥eg] =S4|-9n4 | ny 2 [3 ost Fs [pean (8 Joss] pany © scanned with OKEN Scannerearntna - te [ fra you want to calculate the probability of playing when the weather is er ty of pIBTIDE? pont) = Ove | Yo) Pe) /P(Ovec . 4, cate! provers)” 4/14 =0.29 pote) 914-088 alate Posterior Probabilities: | 7 overcast IYes) =A/9 = 0.44 put Prior and Posterior probabilities in equation (1 p ves | Overcast) = 044 * 0.64 /0.29 = 098(Higher) culate the probability of not playing: (1) te Prior Probabilities: similarly, you can cal pabitty of not playing: = P(Overcast | No) P(No) / P (Overcast) . pono | overcast) 1, Calculate Prior Probabilities: (Overcast) = 4/14= 0.29 P(No)= 5/14 =0.36 2, Calculate Posterior Probabilities: P (Overcast 1No) = 0/9=0 4, put Prior and Posterior probabilities in equation (2) P(No | Overcast) = 0 * 0.36 / 0.29 =0 ‘he probability ofa 'Yes' class is higher. So you can determine here if the weather is overcast than players will play the sport. 3.9.3 Advantages of Naive Bayes Classifier and Fast: Naive Bayes is quick to train and efficient in prediction due to its Simple simplistic assumption of independence among features. Effective with Small Data: Performs well with small datasets and is less prone to overftting compared to complex models. Handles Irrelevant Features: Can handle irrelevant features well by assuming independence; it still performs reasonably even if some assumptions are violated. © scanned with OKEN ScannerSy EEE Te \ ‘Perv Works with Categorical and Continuous Data: Accommodates both cag, : in iecvernall arious types of datasets, continuous data, making it versatile for various typ e Mang il ification: Particularly effective in text classificay, Good with Text Classification: een ony spam filtering, sentiment analysis, and docum 8 ion: m" * Requires Less Training Data: Needs a relatively smaller amount Of trai estimate parameters efficiently. * Interpretable Results: Offers transparent and interpretable outputs, to understand the reasoning behind predictions, ining day b “making ig & 3.9.4 Disadvantages of Naive Bayes Classifier * Assumption of Feature Independence: The classifier assumes ind, ay features, which might not hold true in real-world scenarios, leading to inact in predictions. Handling Continuous Data: Naive Bayes performs less effectively with Contin, data as it assumes a normal distribution, impacting accuracy in such cask, Sensitive to Data Quality: It can be sensitive to noisy or irrelevant features, aff classification accuracy. “ah * « Zero Probability Issue: Occurrence of a feature value not present in the training day leads to zero probability estimation, causing the entire probability to be zero. * Limited Expressiveness: Due to its simplicity, Naive Bayes may not capture comple, relationships among variables, limiting its ability to model intricate decison boundaries in data. 3.9.5 Applications of Naive Bayes Classifier * Spam Email Filtering: Naive Bayes is widely used in email services to clasiy emails as spam or non-spam based on the presence of certain words or features. * Text Classification: Applied in sentiment analysis, news categorization, and document classification tasks due to its effectiveness in handling text data. * Medical Diagnosis: Utilized in medical fields to assist in diagnosing diseases bast on symptoms or patient data. * Recommendation Systems: Used in collaborative filtering to recommend produts or services by analyzing user preferences and iter features. * Credit Scoring: Employed in finance for credit scoring and risk assessment bY evaluating various factors associated with loan applications. © scanned with OKEN Scanner| yA: APD 9 met foy fi 8) “ “gf RODUCTION TO REGRE gs) nal agp PTE Macy, RE Vie isa statistical approach sed 4, mo ' sae variables or features ang a depo. tiBate 1" ype aft ON ar, “s Tring aleor, | the peg fa thod f Ht vaca tonahiy pn ing as a method for pre: rearing 2 Predi : 8 Uh en : tive mog ed to predict continuous outcomes, hich Sed in . 3D algorig cq opresion i8 an essential componen of predict, Sotithm i chin i e , Bde range of machine learning @Pplications, HDG it canbe nen redicting healthcare tre T Powers Useq sing Of P a nds, Tegressi ting finan, s, onskey insight for decision-making on i wil analysis be ofthe key characteristics of supervised tog # Mtonihg . "li te vale for new data by modeling dependencies ang inter aity tg ibe target output and input variables, Regression algorithms "Tactions sduesbased on input features from the data fed in the system. Th the output ae the algorithm to create a model ba ak s sed on the attributes tandard sense the model to forecast the value f for new data, OF ing data ang | Hegesion analysis, in particular, enables us agent variable changes in relation to an independent Vatable while thease independent variables are maintained Constant. It predicts Continuous/rea) values suchas temperature, age, salary, and Price, among others, BAMPLE: Assume there is a marketing firm X that TUns variou: S advertisements ‘nugtout the year and get sales on that. The below list shows the adver tisement made by *enpayinthe lst 5 years and the corresponding sales: AL Seer eee | 1 | 2.|__ Advertisement {In Lakh) Sales (In lakh) 3 Rs. 90 Rs.1000 I Rs.120 Rs.1300 E 5 Rs.150 Rs.1800 E sf Rs.100 5.1200 b 7 | Rs.130 Rs.1380 L. BLSee Rs.200 22? /[ | © scanned with OKEN Scanner[raze 3.52 Sue, Now, the company wants to do the advertisement of % 209 Lakh in : Mie wants to know the prediction about the sales for this year, S «, N - : & prediction problems in machine learning, we need regression analysi, In Regression, we plot a graph between the ae Which boop nt datapoints, using this plot, the machine learning model can make Predicgig® e data, = a In simple words, "Regression shows a line or curve that passes through, al ng 4, ‘ target-predictor graph in such a way that the vertical distance between the dats in ‘Points % regression line is minimum." a : The distance between datapoints and line tells whether a mode] has ca ak Phuteg relationship or not. a &y Terminologies Related to the Regression Analysis: * Dependent Variable: The dependant variable is the main factoy in analysis that we wish to predict oF understand. It is also know se Rete variable. the tg ° Independent Variable: The factors that influence the dependent Variab} used to predict the values of the dependent variables are referred to aq Ms may variables, sometimes known as predictors. pend, Outliers: An outlier is a value that is either exceptionally low or Very igh vay comparison to other observed values. An outlier may hamper the result, soit sha be avoided. * Multicollinearity : Multicollinearity occurs when the independent variables substantially associated with each other but not with other factors. It should rot included in the dataset because it causes issues when ranking the most infuns! variable. * Underfitting and Overfitting : If our algorithm performs well on the tai dataset but not on the test dataset, then such problem is called Overfiting. Andifte algorithm does not perform well even with training dataset, then such problan’ called underfitting. Why do we use Regression Analysis? Regression analysis, as previously said, helps in the prediction of a continuous ee There are many instances in the real world where we need to anticipate the future, © scanned with OKEN Scannera ds, and 80 oy ting ten NIN they en eutate Predictions, fy ich 6 8, Wen a be te Ci in machine learning ang data umstange me a ot ysis inetude® PEN ig eg RY nal , i, an, ja imates he relationship joy, SS Other gy, sion en the ta ns fg fo "BE and 4, 1 fd the trends in data, * dehy iw 7 redict real/continuous values, ' 10 : Tg the regression, we can coin | wae important factor, an ‘etermi | fi ye east impo? id how each, factor ig ee the most tape | a i en ane of different approaches used in machi eee techniques i : ine lea te te different iq : may indude different a lear Petiem, res diffrent types of data. Distinct types of machine vt epee of i t fete lear; ory assume a different relationship between the indepen regression Sependent >” as types of regression algorithms are 1 x a ear Regression spl Linea Regression pmol Regression + ogi Regression Maxim | likelihood estimation (least squares) fl LINEAR MODELS - INTRODUCTION Telinear model is a type of machine learning algorithm that is commonly used for spnied leaming tasks, such as regression. It is based! on the idea of fitting a linear ‘gimtoa set of data points, which can then be used to make predictions about new data, ‘othe case of regression, the goal of the linear model is to find the line of best fit that ‘Sch the relationship between the independent variable(s) and the dependent variable. “ewionora simple linear regression model can be expressed as: ye mx +b © scanned with OKEN ScannerEEE 8 where y is the dependent variable, x is the independent variable, ee MN 5h ine, and b is the intercept. lone " In the case of classification, the linear model : ~ to separate data Points; h classes based on their features. This is done by fitting a hyperplane (a highe to 4 a i oo Nigher.g, version ofa ine) to the data that maximizes the margin between th dtr site 885 Mh, 1 known as a linear support vector machine (SVM). : The linear model is popular in machine learning because it ig simple h interpret, and it can work well on large datasets. However, it may no, beg, complex data that cannot be accurately described by a linear equation, d's Mate complex models such as decision trees or neural networks may be needeg,"* Ste me, In this article, we will cover two crucial linear models in machine Teaming. (a) linear regression (©) logistic regression Linear regression is used for regression tasks, whereas -logistic. res classification algorithm. s, 3.11.1 Simple Linear Regression Model A simple linear regression algorithm models the relationship between a q variable and a single independent variable. The relationship shown by a Simple in Regression model is linear or a sloped straight line, hence it is called Simple ing Regression. The most important aspect of Simple Linear Regression is that the dependant vary ‘must be continuous/real. The independent variable, on the other hand, can be messin’ using either continuous or categorical values, Simple Linear regression algorithm has mainly two objectives: * Model the relationship between the two variables. Such as the relationship between Income and expenditure, experience and Salary, etc. * Forecasting new observations. Such as Weather forecasting accorting ® ‘emperature, Revenue of a company according to the investments in a year, iable i A Simple linear regression has only one x variable and one y variable in its a ‘orm. The x variable is known as the independent variable ence v independent © scanned with OKEN Scannerr. yearning Pa redict the dependent variable, ‘The Ea : Y variable j tt ‘ Pr what you ty to predict, is the dependent variable because ern yearby prota nsed for simple linear regression, sth ; : js the predicted value of the dependent variable Wt i t variable (x). ‘OF any given value of the * ndependen aisthe intercept, the predicted value of y when the xis 0, Dependent Variable Independent Variable pis the regression slope coefficient how much we expect y to change as x increases. ris the independent variable ( the variable we expect is influencing y). ‘The above graph presents the linear relationship between the output(y) variable and predictor(X) variables. The straight line is referred to as the best fit straight line. We attempt toplot the best line possible based on the data points provided. To calculate best-fit line linear regression uses a traditional slope-intercept form which isgiven below, yi zat bxi where yi = Dependent variable, a =constant/Intercept, b =Slope/Intercept, x: = Independent variable. This algorithm explains the linear relationship between the dependent(output) variable Yand the independent(predictor) variable x using a straight line y= a+ bx. © scanned with OKEN Scanner: the optimum values for a and bi - jon algorithim seeks ‘ Mord The near Sone with the Teast eor, which means that he OS ues should be as little as possible. ting, fit Hine. The ees projected and acto Find asp egessio? model for the following data: 1 x ¥ 2] 1 4 3 2 7 4 3 5 4 6 5 Let the simple linear regression model be y=athe Steps to find a and b, First, find the mean and covariance. Means of x and y are given by, —é © scanned with OKEN Scannerofx andy, denoted by Cov(s, is dofineg 7 ye im —2\yi-7) ta cov ofaand b can be computed using the follow yer cov (ed) Mowing formula be Var @) acy-lt ofxand y, the mean pit ned ge H{L0+20+30+40450) =30 i = H(L00-+2.00+1.30 +375+225 = 2.06 \otfin the Covariance between x and y, covey) = yee -F\yi-7) cov (e,3) = Fle.0-3.0)0.00 2.06) +..+(60-80)225~206) = 1.0625 Now ind the variance of x, 1 =? Var) = 3x, - 3; far (x) ae Var (x) = i {(1.0-3.0)2 + n+ (50-307) | =25 © scanned with OKEN Scanner“Une, xptand coofficient® ~\ 2 inert find nT sa) he Var(x) Now; ang lt 1,0625 be—95 = 0.425 1 =2,06- 0.425% 3.0 = 0.785 for the above example, ted values of a and b: the estimated regression equation ig Cons Therefore, the basis of ‘the estima’ y = 0.785 + 0.425% + the intercept from the above equation is 0.785, Sion, me value of Y as a result of a unit-change ; Sy my, The value of ue of the Y increases by 0.425 for each," * Merease, in, estimated change in the average Slope-0.425 tells us that the average va value of X. 3.11.2 Logistic Regression + Logistic regression is a common ML algorithm that is part of the s uy Learning technique. It is used for predicting the categorical dependent Vari a given set of independent variables. mney Logistic regression predicts the output of a categorical dependent vari result, the outcome must be categorical or discrete. It can be Yes or No, 7 ae False, and so on, but instead of presenting the exact values like 0 and 7 fas the probability values that fall between 0 and 1. , it presens oe linear regression, which maps input data to continuous output val iti regression algorithm maps input data to a probability. Logistic ire r - lels are used to predict the Probability of an event occurring, such as me not a customer will purchase a prod: shad Sipoener aimee as i ee The output of the logistic Tegression modd Soruueanes - The output represents the probability thatthe * In Logistic rey ion, in “gression, inste: iti i ee _ of fitting a regression line, we fit an "S" shaped logs ‘© maximum values (0 or 1). s BR © scanned with OKEN Scannerthe logistic function indicates the tik ! re cancerous OF NOt, a Mouse jg ‘clihoog Obese of not fear ee Is 4 ne cel thing Such 4 © At se the jon 8 8 significant machine learnin, obabilities and classify new day, eotthm ; CAtiGe ‘a Using it 8 Continuo, and has sion can be used to classify the observati tions using a, ey determine the most effective vaiaban on illern showing the logistic function: casi mage Pes of Sed for the Classifica = ofthe! logistic regression model (sigmoid function output) is always between te utput js near to zero, the event is unlikely to occur. If the output is near to a phe came likely to occur. For example, if the result of the logistic regression we ved by the sigmoid function) is 0.8, it means that the probability the event is 08 given particular set of parameters learned using cost function on. Based Of the threshold function, the class label can said to be 1. For any new put of the above function will be used for making the prediction. seKtheow saislogistic regression different from linear regression? ‘te output of linear regression is continuous and can take any oe the case of \gieregression, the predicted outcome is discrete and restricted to a limited number of ss raample, assume we're attempting to apply machine learning to the sale of ahouse ‘want to forecast the sale price based on the size, year built, and number of stories, re "pear regression, which can predict any potential value. If we want to predict © scanned with OKEN Scannersion model and logistj, ic ry whether OF the a jimited 10 Y faites Hence, lineat 78 example of @ classifi Fun Logistic Function istamoté mathematical function that is used t The sigmoid function 18 @ 2 conv values into probabilities: vias my © Itmapsany real value into another value wil a range of 0 and 1, The logistic regression value must be bat 0 and 1, and it cannoy Jimit, forming 2 e similar to the S" form. The Sigmoid funcg; Sey function is another name for the $-form curve. On of | _ The concept of the threshold value is used in logistic regression or 1. Such as values above the threshold value ting ibe ‘ Sto "ay probability of ether 0 the threshold values avalue below tends to 0. Logistic Regression Equation a can be obtained from the Linear Regre ssi a: CqUation, The Logistic regression equatio! The mathematical steps to get Logistic Regression equations are given bel OW: the straight line can be written as: © We know the equation of ya by +bixy + bpx2 +5%3 tot Onn In Logistic Regression y can be between 0 and 1 only, so for this | above equation by (1-y): —; = infi o. (1g): Xj Ofory=0, and infinity for y=1 : « But we infini e need range between -|infinity] to +[infinity], then tak , e logarithm of the equation it will become: lo; y : tog] 2} = by + 1y Og + by xy + by Xq +323 Fone Dy Xp The above equation i ,quation is the final equati -quation for Logistic Re ‘egression. Assumptions for Logistic Regressio: n The assumpti : wee iptions for Logistic tegression are as folk independent observati lows: rvations: E: that there is : Each observation is di no jati ion is disti association between any of the = — cm the others. This me arlables. © scanned with OKEN Scannernd variables: It assumes that the depe, lependent Vari; aia dent atitc ies which means that it can only have two val able a mor ip between independent y, alues, 00 i tion ; Hee vee the independent variables rere and Io iat? i be Hines 8 Odds of mu ' oye shot dataset should have no outliers, oes ee i il - sine! The sample size is sufficiently large, SY ef sup tt ¢ learnin; yearning in which machin eteea” training data, and on basis of that wes are trained ae jabel at data, machines predit the ' ir A a ie labeled data means some input data is already tagged with the correct opt S ens . sced leaning, the training data provided to the machines work as th For that teaches the machines to predict the output correctly. It applies a vince 252 student learns in the supervision of the teacher. © in brief, the concept of classification. ! an ' cxssifcation js a process of categorizing data or objects into predefined classes or & agres based OM their features or attributes. Classification is a form of iced learning technique in machine learning in which an algorithm is tained on labeled dataset to predict the class or category of fresh, unseen data. te primary goal of classification is to create a model capable of accurately asigning a label or category to a new observation based on its properties. For eanpl a classification model might be trained on a dataset of images labeled as ‘iter dogs or cats and then used to predict the class of new, unseen images of ogsorcats based on their features such as color, texture, and shape. 4 , 5 Miatae classification and regression in a supervised learning? Or Yio do classificati “kssification and regression differ? © scanned with OKEN ScannerAns: eee ~, Repression 1 the tage > ation is the task t0 predict a discrete n is continuous quantity, “ Prog, cg data is labeled A regression probl of a quantity. eM Needs ih. 7 Pred A regression problem cont % input variables is cate : regression problem, lass label. Tra dassfication problem into one of two oF MOTE lasses. ‘on having problem with two ‘dassification, and multi-class ‘A dassiticati classes is called binary mr rove than two classes is called a mug classification. iy Classifying an isan example ofa classification ssification algorithms. tion Algorithms to Know: Predicting the pri Price of a Stock ‘email as spam or non-spam oblem. ica Of time's problem. | period of time is a regression Problem, ® Q4. List the most popular cla ‘Ans: The most important Classifica « K-nearest neighbor (KNN) « Decision tree « Random Forest Support Vector Machine (SVM) Q5. What is k-nearest algorithm? Ans: a Neighbor is one of the simplest Machine Learning al 2 ‘i ; algorit a Learning technique. The K-NN method assumes pair Peela ae te - ao a and existing cases and places the new case in the en Pete se existing categories. The K-NN algorithm maintain: ee rae sifies a new data point based on the similarity. This mi eats er i i . K-NN gi nerated, it may be quickly classified into a i aes algorithm. well-suited category using te ‘6 Hi rs low the value of ‘k’ is calculated in KNN algorithm? 1s: The value of K : thd waist a can be chosen though cross-validation, Tak th Se ng ast and call ita validation dat . Take the small portion from = ni possible values of K. This a a faset, and then use the same to evaluate ce dati : i i and = the validation set using with K ee en We look at what value of K give equals to 1, K equals to 2, K equals 03: Bives us the best performance on the validation © scanned with OKEN Scanneran take that value and use that as the 7 xe validation error , final so gh ting of o PU algorithy f° in misanes! ea the cartesian distance betwee, an "ga? ply yctidean distance can also be eh {WO Location " 100 o, Bul Tepresented ag Sin the ue wecting the WO ae the length of ie . ‘i isplacement between two states of an object 1S Mettic allows ne nel 0 ats, y) ~ YER ya Euclidean fas tte avatages Of KNN algorithm? some of the advantages of using the k-nearest neighbors algorithm: rseasy tO understand and simple to implement. alt jpanbe used for both classification and regression problems. , Because there are NO assumptions about the underlying data, it is appropriate for roninear data. + htisnaturally capable of handling, multi-class instances. + Itcen perform well with enough representative data. Giveay three real life applications of KNN algorithm. tkearesome applications for the KNN algorithm: " Gait rating: The KNN algorithm assists in determining an individual's credit "ng by comparing them to others who share similar traits. i ‘pproval: The KNN method, like credit ratings, is useful in detecting oa who are more likely to default on loans by comparing their qualities Similar persons, © scanned with OKEN ScannerAns: Qu. Q12. “Une; A atasets CO have many missing values, jmputation” which estimates missin, a kn ny 8 Values N ng : hy Data preprocessie For issn sain decision ee i yithm in bret 5 ervised jearning aa robles, howe it is most common). pica tee ett eeenn whi : erp represent decision rul Ch inten 5; and eq, a has two nodes: the Decisic cision Ie Node ang pecan TY as von ar 7 ession P jon problems. ‘ putes, branches , sult. A Decision tree to make decisions and have Pumeroys wy ‘sion nodes are us resent the results of those decisions ang d 10 not ne 5 additional branches- | Explain Entropy Iso called the amour Entropy is denoted by H(S) fora fn ite seg 5 as Shanno -andomness in data. 7 is, Entropy, al: measure of tof uncertainty oF" 1 Zs p(x)log2——~ JP )log2 7) predictability of a specific event. Consider a co} 05 probability of tails. Because ae 035 With ae rowing what will happen, the enttOPY is as high as it can be. een 7 sides; the entropy of such an event can be ful savider a Coin : that it wil lays be heads In ue “ ctability, its entropy is 0. Lower oe ok alues indicate greater uncertainty. indict k It tells us about the probability of heads and @ heads on both we know ahead of tin event has no unpredi uncertainty, whereas larger V: Define Information gain. Information gain is denoted by IG(S,A) for a set S is the ca ake iz cae attribute A. It measures eee pth pa independent variables. It can be anaes i using the bi formula: IG(S, A) = H(S) - H(S, A) Alternatively, IG(S, A)= i (GA) = HIS) ~ ¥ P(x) x HH) & sc i . ‘anned with OKEN Scannerinformation gain by I f, je tne in Y applying A) ne second term calculates YP aa le 2 © A. 15 e probability of event x, MODY afty | ; i? . trate for reducing, the complexity vse 7 unlikely to increase the tree's accu, Of deg : at : i a te overitting: we oe When the tr : ' M ining data, resulting j et Tuniy " 6 ves net ing in poor Benesaliay, £00 lca ; . ne io o co na ES OP "onew day ce ge TS involves setting a limit on the depth ee af snstances required in a leaf, or the N of the tree, it, Pre-pruning is often used whey ent ilormaton for a SP *hen the ation pai adi dataset Bai af Jn building 4 large tree would be computationally / ally expeng; ot ves build «ge This involves building th a pai . ig the full decision tree and th , ost that do not improve the accuracy of the tree on a valiant vee used when the dataset is small or clean a valaton set. Post. eis not computationally expensive. Y and when building a aifferent from k-means clustering? is Ca vga supervised algorithm, and k-means clustering is an unsuper | 7g ort K-nearest neighbors to work, we require labeled data ey ‘ splbeled point. K-means clustering needs only a threshold and a ae ints: the algorithm takes the unlabeled points and slowly leams how to gid them into groups by calculating the mean of distance between the points. 5 Whats Bayes ‘Theorem? i ys theorem helps t0 determine the probability of an event with random svg. Itis used to calculate the probability of occurring one event while other ce ieady occurred. It is a best method to relate the condition probability and saa probability. In simple words, we can say that Bayes theorem helps to csirbute more accurate results. Bayes Theorem is used to estimate the precision of riusand provides a method for calculating the conditional probability. 4 Matis Naive Bayes Classifier? ® hteByjes algorithm is a supervised learning algorithm, which is sed on Bayes © scanned with OKEN Scannerores Qi7. Q18. Qi. Ans: Q2. Ans: Dery, 4 sification problems. IC is yay un or solving theorem and used ie Fades a hist mensional training dataset ; classification oe the simple and most effective Classification algon hy Clasifr fone of ine aring model ha can make uit ny Hy help in building classifier, which means it predicts on the basis of the Pegs isa prota cas mples of Naive Bayes Algorithm apg PM an object. Some P' a classifying articles. Pam fg imental analysis, Sentimental analys el u mean by linear regres jel a statistical method used in machine fang, . mn a dependent variable and one ot morg i variables. It predicts continuous numeric outcomes, like predicting sales ey advertising spend, using a linear equation. The model ASSumeg a . relationship between the independent and dependent variables a 2 Tney minimize the difference between predicted and actual values using tegn, ordinary least squares. qu What do you mean by logis! Unlike linear regression, logistic regression predicts categorical Outcomes, fp, for binary classification tasks, determining the probability of an instance bet toa particular class. The model estimates the probability using the | logistic foe and is suitable for tasks like spam detection or medical diagnosis, he | outcome is a 'yes' or ‘no’ scenario. Instead of predicting exact values, it predsg, What do yor Linear Regression mod the relationship betwee ly tic regression model? probability of a binary outcome. 4 Long Answer Questions 7 ‘What is supervised learning? Give any two examples of supervised learning. Refers section 3.1 and 3.2 Explain Classification model. Discuss the steps involved in classitication leamisy Refers section 3.4 and 3.5 What is KNN algorithm? Explain its working by taking suitable example, Refers section 3.6.1 What is KNN algorithm? Discuss its advantages and applications. Refers section 3.6.1 © scanned with OKEN Scannerfe dataset ang + gon 68 eh ya vo impl am to implement a k-Neare, t Nos, wr 4 ge train the classifier on the g,,,\°i8hbors Wang a fvaluate th, | i? oI to implement a decision tree classifi ye? fier ye; wi the decision tree and understand its splits °F Using SkitSeam ns ection 37.7 to use skleam’s p, Fag Python program eed siont, . 7 Tee] i i ete a skleam datasets. Inplems pattities ‘6 buita aed ce of a split (entropy, information gain, Bini meas 0 find the ection 3.7.7 § hin Bayes Theorem by taking suitable example, t agesseton 3.8 gin Naive Bayes Classifier with an example of its use in Practical life, ys Refers section 59 o piseuss the various steps involved in the optimization framework for linear models. ss Refers section 3.11 [ EXERCISE tic ‘the {Deserve the motivation behind random forests and mention two reasons why hey ‘we better than individual decision trees? 2 Bplain what is information gain and entropy in the co 2 tat are the different methods to split a tree ina decs \ Conteyes Belief networks solve all types of PFODEMS? ng, is § Whatis Naive Bayes Classifier? Discuss Its advantages 4” ctor vineat eo} * Whats tinear model? Discuss the optimization Framewo wt! chine learning, seqressn #08 1 : ateus the difference between logistic ble example, | text of decision trees? ion tree algorithm? eos" © scanned with OKEN Scanner

ch-4 FML
No ratings yet
ch-4 FML
13 pages
ML Unit-1-1
No ratings yet
ML Unit-1-1
16 pages
Machine Learning CH 4
No ratings yet
Machine Learning CH 4
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
61 pages
Chapter 2
100% (1)
Chapter 2
124 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
ML & Cloud Computing for IoT
No ratings yet
ML & Cloud Computing for IoT
149 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
ML & Cloud Computing For Iot: Topics in Module-3
No ratings yet
ML & Cloud Computing For Iot: Topics in Module-3
38 pages
ML 2
No ratings yet
ML 2
166 pages
Chapter 7 Supervised Learning Classification
No ratings yet
Chapter 7 Supervised Learning Classification
28 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
Classification
No ratings yet
Classification
22 pages
3-1 Supervised Learning With Scikit-Learn - Chapter 1 Classification
No ratings yet
3-1 Supervised Learning With Scikit-Learn - Chapter 1 Classification
87 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
ML Chapter 3
No ratings yet
ML Chapter 3
45 pages
Unit II
No ratings yet
Unit II
25 pages
Machine Learning Section4 Ebook v03
No ratings yet
Machine Learning Section4 Ebook v03
20 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
27 pages
ML Classification Essentials
No ratings yet
ML Classification Essentials
50 pages
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
No ratings yet
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
16 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
Chapter 2 Supervised Learning - p1-2
No ratings yet
Chapter 2 Supervised Learning - p1-2
45 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
MAchine Learning Notes
No ratings yet
MAchine Learning Notes
6 pages
Unit 2
No ratings yet
Unit 2
151 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Classification
No ratings yet
Classification
53 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Lecture 1.1. Introduction
No ratings yet
Lecture 1.1. Introduction
48 pages
Lecture2 02092025 115623am
No ratings yet
Lecture2 02092025 115623am
39 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
Session 3 Types of Machine Learning
No ratings yet
Session 3 Types of Machine Learning
22 pages
Inductive Learning and Machine Learning
100% (1)
Inductive Learning and Machine Learning
321 pages
Unit 1
No ratings yet
Unit 1
21 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Classification
No ratings yet
Classification
23 pages
MI - Unit 3
100% (1)
MI - Unit 3
107 pages
Week 01
No ratings yet
Week 01
37 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
83 pages
Introduction To ML
No ratings yet
Introduction To ML
46 pages
Unit 2
No ratings yet
Unit 2
63 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
39 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
Unit 1
No ratings yet
Unit 1
24 pages
Week 8
No ratings yet
Week 8
70 pages
Supervised Learning Algorithmn
No ratings yet
Supervised Learning Algorithmn
4 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
Classification
No ratings yet
Classification
21 pages

ML 3RD Unit

Uploaded by

ML 3RD Unit

Uploaded by

You might also like