0% found this document useful (0 votes)

67 views10 pages

B24 ML Exp-3

The document provides information about implementing a support vector machine (SVM) algorithm. It begins with the aim, prerequisites, and expected outcome of the experiment. It then provides theory on SVM including: the objective is to find a hyperplane to classify data points with maximum margin; linear and non-linear SVM types; and kernels which transform data to higher dimensions. Pros and cons of SVM classifiers are outlined. The document is divided into parts A and B, with part B to be completed by students to code and apply an SVM model to a lung cancer dataset.

Uploaded by

SAKSHI TUPSUNDAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views10 pages

B24 ML Exp-3

Uploaded by

SAKSHI TUPSUNDAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

PART A

(PART A: TO BE REFFERED BY STUDENTS)

Experiment No. 3
A.1 Aim:
To implement Support Vector Machine.

A.2 Prerequisite:
Python Basic Concepts

A.3 Outcome:
Students will be able To implement Support Vector Machine.

A.4 Theory:

Machine Learning, being a subset of Artificial Intelligence (AI), has been playing a
dominant role in our daily lives. Data science engineers and developers working in
various domains are widely using machine learning algorithms to make their tasks
simpler and life easier.

The objective of the support vector machine algorithm is to find a hyperplane in an N-

dimensional space(N — the number of features) that distinctly classifies the data points.
To separate the two classes of data points, there are many possible hyperplanes that could
be chosen. Our objective is to find a plane that has the maximum margin, i.e the maximum
distance between data points of both classes. Maximizing the margin distance provides
some reinforcement so that future data points can be classified with more confidence.
Hyperplanes are decision boundaries that help classify the data points. Data points falling
on either side of the hyperplane can be attributed to different classes. Also, the dimension
of the hyperplane depends upon the number of features. If the number of input features is
2, then the hyperplane is just a line. If the number of input features is 3, then the
hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the
number of features exceeds 3.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.

SVM algorithm is implemented with kernel that transforms an input data space into the
required form. SVM uses a technique called the kernel trick in which kernel takes a low
dimensional input space and transforms it into a higher dimensional space. In simple
words, kernel converts non-separable problems into separable problems by adding more
dimensions to it. It makes SVM more powerful, flexible and accurate. The following are
some of the types of kernels used by SVM.
Linear Kernel
It can be used as a dot product between any two observations. The formula of linear
kernel is as below −
K(x,xi)=sum(x∗xi)K(x,xi)=sum(x∗xi)

From the above formula, we can see that the product between two vectors say 𝑥 & 𝑥𝑖 is
the sum of the multiplication of each pair of input values.
Polynomial Kernel
It is more generalized form of linear kernel and distinguish curved or nonlinear input
space. Following is the formula for polynomial kernel −
k(X,Xi)=1+sum(X∗Xi)^dk(X,Xi)=1+sum(X∗Xi)^d
Here d is the degree of polynomial, which we need to specify manually in the learning
algorithm.

Pros and Cons of SVM Classifiers

Pros of SVM classifiers

SVM classifiers offers great accuracy and work well with high dimensional space. SVM
classifiers basically use a subset of training points hence in result uses very less memory.
Cons of SVM classifiers
They have high training time hence in practice not suitable for large datasets. Another
disadvantage is that SVM classifiers do not work well with overlapping classes.

PART B
(PART B : TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the practical. The
soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge faculties at
the end of the practical in case the there is no Black board access available)

Roll. No. B24 Name:Sakshi Bhaskar Tupsundar

Class: BE-Comps Batch:B2
Date of Experiment:10-10-2023 Date of Submission:12-10-2023
Grade:
B.1 Software Code written by student:

import numpy as np
from google.colab import drive
import csv
import pandas as pd
import seaborn as sns

df = pd.read_csv('/content/survey lung cancer.csv')

df.shape
df.isnull().sum()
df.head()

from sklearn import preprocessing

# label_encoder object knows

# how to understand word labels.
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.

df['GENDER']= label_encoder.fit_transform(df['GENDER'])
df['GENDER'].unique()
df['LUNG_CANCER']= label_encoder.fit_transform(df['LUNG_CANCER'])
df['LUNG_CANCER'].unique()

df.head()

import matplotlib.pyplot as plt

plt.figure(figsize=(14, 8))
plt.suptitle("Lung Disease Prediction")
ax = plt.gca()
df.boxplot()

#removing outliers
import pandas as pd

columns_to_check = ['LUNG_CANCER']

# Step 1: Calculate the first quartile (Q1), third quartile (Q3),

# and IQR for each column
Q1 = data[columns_to_check].quantile(0.25)
Q3 = data[columns_to_check].quantile(0.75)
IQR = Q3 - Q1
# Step 2: Define the outlier boundaries
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Step 3: Identify outliers for each column

outliers = {}
for column_name in columns_to_check:
outliers[column_name] = data[(data[column_name] <
lower_bound[column_name]) |
(data[column_name] >
upper_bound[column_name])]

# For example, if you choose to remove the outliers:

data_cleaned = data.copy()
for column_name in columns_to_check:
data_cleaned = data_cleaned[
(data_cleaned[column_name] >= lower_bound[column_name]) &
(data_cleaned[column_name] <= upper_bound[column_name])]

Applying SVM model before outlier removal

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X = data.drop('LUNG_CANCER', axis=1)
y = data[‘LUNG_CANCER']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

from sklearn.preprocessing import StandardScaler

st_x= StandardScaler()
X_train= st_x.fit_transform(X_train)
X_test= st_x.transform(X_test)
log_reg = LogisticRegression(max_iter=1000)

log_reg.fit(X_train, y_train)

y_pred = log_reg.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Applying SVM model after outlier removal

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Split the data into features and labels

X = data.drop('LUNG_CANCER', axis=1) # Adjust as needed

y = data['LUNG_CANCER']

# Initialize an empty list to store selected features

selected_features = []
best_accuracy = 0.0

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

while len(selected_features) < X.shape[1]: # Repeat until all features

are selected
# Find the feature that improves the model the most
best_feature = None
best_feature_accuracy = 0.0

for feature in X.columns:

if feature not in selected_features:
# Create a new feature set by adding the current feature
current_features = selected_features + [feature]

# Train an SVM classifier on the current feature set

svm = SVC()
svm.fit(X_train[current_features], y_train)

# Make predictions on the test set

y_pred = svm.predict(X_test[current_features])

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Check if this feature improves accuracy

if accuracy > best_feature_accuracy:
best_feature_accuracy = accuracy
best_feature = feature

# Add the best feature to the selected features list

selected_features.append(best_feature)
best_accuracy = best_feature_accuracy

#Print the selected feature and its accuracy

print(f"Selected Feature: {best_feature}, Accuracy:
{best_accuracy:.4f}")

print("Forward selection complete.")

print("Selected Features:", selected_features)

Applying SVM model sfter feature selection Process

#from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

# Create an SVM model with the 'rbf' kernel

clf = svm.SVC(kernel='rbf')

# Fit the SVM model to the training data

clf.fit(X_train_after, y_train_after)

# Make predictions on the test data

y_pred = clf.predict(X_test_after)

# Calculate accuracy on the test set

accuracy = accuracy_score(y_test_after, y_pred)
print("Testing Accuracy:", accuracy)

# Perform cross-validation and print the cross-validation scores

cv_scores = cross_val_score(clf, X_train_after, y_train_after, cv=5) #
You can change the number of folds (cv) as needed
print("Cross-Validation Scores:", cv_scores)
print("Mean CV Score:", cv_scores.mean())
print(classification_report(y_test_after, y_pred))
hyperparameter tuning for SVM

import pandas as pd
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from scipy.stats import randint

# Assuming the columns are named 'feature1', 'feature2', and 'target'

X = data[['AGE']]
y = data['AGE']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Define the model

rf_model = RandomForestClassifier()

# Define the hyperparameter distributions to sample from

param_dist = {
'n_estimators': randint(50, 200),
'max_depth': [None] + list(randint(1, 20, 10).rvs(10)),
'min_samples_split': randint(2, 11),
'min_samples_leaf': randint(1, 5)
}

# Handle None for max_depth separately

param_dist['max_depth'].append(None)

# Perform randomized search with cross-validation

random_search = RandomizedSearchCV(estimator=rf_model,
param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy',
random_state=42)
random_search.fit(X_train, y_train)

# Get the best hyperparameters

best_params = random_search.best_params_
print("Best Hyperparameters:", best_params)
# Evaluate the model on the test set using the best hyperparameters
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy on Test Set:", accuracy)

B.2 Input and Output:

SVM Model Scores

Training Accuracy score 0.89475
Testing Accuracy score 0.86
ROC_AUC score 0.951477
CV score 0.756

SVM Model(Feature Slection) Scores

Training Accuracy score 0.84963
Testing Accuracy score 0.9575
ROC_AUC score 0.9153
CV score 0.9425

Hyper Parameter tuning for Scores

SVM Model
Accuracy score 0.91935483870
ROC_AUC score 0.55
CV score 0.95967741
B.3 Observations and learning:
 The SVM classifier with an RBF kernel demonstrated strong predictive capabilities, achieving a
high accuracy rate and effectively classifying data points into their respective classes.
 Support Vector Machines are powerful classifiers that can be applied to a wide range of
classification problems.
 Evaluating the performance of an SVM model through metrics like accuracy, precision, recall,
and the confusion matrix helps in understanding its strengths and weaknesses.
 SVMs with RBF kernels are suitable for complex datasets with non-linear relationships, but
hyperparameter tuning and feature selection are crucial for optimizing their performance.

B.4 Conclusion:
In this experiment, we successfully implemented a Support Vector Machine (SVM) classifier with an
RBF kernel on a given dataset.

B.5 Question of Curiosity

Q1. What is a support vector machine (SVM)?

Ans: A support vector machine (SVM) is a type of supervised learning algorithm used in
machine learning to solve classification and regression tasks; SVMs are particularly good at
solving binary classification problems, which require classifying the elements of a data set into
two groups.

The aim of a support vector machine algorithm is to find the best possible line, or decision
boundary, that separates the data points of different data classes. This boundary is called a
hyperplane when working in high-dimensional feature spaces. The idea is to maximize the
margin, which is the distance between the hyperplane and the closest data points of each
category, thus making it easy to distinguish data classes.

Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
No ratings yet
Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
5 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
Assignment II Machine Learning
No ratings yet
Assignment II Machine Learning
8 pages
Da Pra Week 12 (SVM)
No ratings yet
Da Pra Week 12 (SVM)
15 pages
SVM Algorithm Guide with Python Code
No ratings yet
SVM Algorithm Guide with Python Code
10 pages
ML Exp 3 Part A
No ratings yet
ML Exp 3 Part A
7 pages
SVM Implementation
No ratings yet
SVM Implementation
8 pages
SVM Classifier Techniques Guide
No ratings yet
SVM Classifier Techniques Guide
15 pages
Classifying Data Using Support Vector Machines (SVMS) in Python
No ratings yet
Classifying Data Using Support Vector Machines (SVMS) in Python
5 pages
SVM7
No ratings yet
SVM7
53 pages
ML W8 Merged
No ratings yet
ML W8 Merged
27 pages
Lab Program (SVM From Scratch)
No ratings yet
Lab Program (SVM From Scratch)
2 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
MLT 07
No ratings yet
MLT 07
8 pages
Aim of The Experiment-Software Required - Theory
No ratings yet
Aim of The Experiment-Software Required - Theory
6 pages
Exp 5
No ratings yet
Exp 5
14 pages
B43 Exp3 ML
No ratings yet
B43 Exp3 ML
5 pages
SVM Experimentxtended
No ratings yet
SVM Experimentxtended
3 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
SVM Guide: Concepts, Implementation, Tuning
No ratings yet
SVM Guide: Concepts, Implementation, Tuning
13 pages
SVM
No ratings yet
SVM
11 pages
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
SVM Experiment Extended
No ratings yet
SVM Experiment Extended
3 pages
Prediction On Iris
No ratings yet
Prediction On Iris
14 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
08 Classification
No ratings yet
08 Classification
46 pages
Madderla 1229719428 L10
No ratings yet
Madderla 1229719428 L10
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
ML Practical 3
No ratings yet
ML Practical 3
5 pages
Classification
No ratings yet
Classification
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
5 Markd
No ratings yet
5 Markd
24 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Understanding Support Vector Machine Algorithm From Examples Along With Code
No ratings yet
Understanding Support Vector Machine Algorithm From Examples Along With Code
11 pages
06 Support - Vector - Machine
No ratings yet
06 Support - Vector - Machine
8 pages
PML Lab Exp 10
No ratings yet
PML Lab Exp 10
3 pages
Lab Week 7
No ratings yet
Lab Week 7
3 pages
MODULE - 4 - PART 2 - Support Vector Machines
No ratings yet
MODULE - 4 - PART 2 - Support Vector Machines
6 pages
Support Vector Machine: Classification, Regression and Outliers Detection
No ratings yet
Support Vector Machine: Classification, Regression and Outliers Detection
26 pages
CSL0777 L23
No ratings yet
CSL0777 L23
39 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Support Vector Machines
No ratings yet
Support Vector Machines
12 pages
This Is
No ratings yet
This Is
7 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
SVM Using Iris Dataset by Hyparlink
No ratings yet
SVM Using Iris Dataset by Hyparlink
19 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Machine Learning With Python - Machine Learning Algorithms-SVM
No ratings yet
Machine Learning With Python - Machine Learning Algorithms-SVM
26 pages
SVM Types
No ratings yet
SVM Types
12 pages
Intro to Support Vector Machines
No ratings yet
Intro to Support Vector Machines
25 pages
SVM: High Accuracy Classifier Guide
No ratings yet
SVM: High Accuracy Classifier Guide
7 pages
Lect 11-SVM
No ratings yet
Lect 11-SVM
14 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
SVM
No ratings yet
SVM
12 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
SVM Guide for Data Scientists
No ratings yet
SVM Guide for Data Scientists
48 pages
Design and Analysis of Algorithm Question Bank For Anna University
100% (4)
Design and Analysis of Algorithm Question Bank For Anna University
20 pages
Worksheet 5 - ArtificialIntelligence (1)
0% (1)
Worksheet 5 - ArtificialIntelligence (1)
2 pages
Chapter 7 Controlling
100% (2)
Chapter 7 Controlling
19 pages
Tono Hartono Visualcv Resume
No ratings yet
Tono Hartono Visualcv Resume
1 page
Business Analysis Tutorial PDF
100% (1)
Business Analysis Tutorial PDF
13 pages
Swarm Intelligence Algorithms Guide
No ratings yet
Swarm Intelligence Algorithms Guide
7 pages
Quality Function Deployment: QFD For Software Requirements Management
No ratings yet
Quality Function Deployment: QFD For Software Requirements Management
46 pages
Lecture 1 Fundamentals of Software Development UTS
No ratings yet
Lecture 1 Fundamentals of Software Development UTS
33 pages
Anugerah Pakarty - Analisis Perancangan Fasilitas Manufaktur
No ratings yet
Anugerah Pakarty - Analisis Perancangan Fasilitas Manufaktur
2 pages
Question Bank For MID-1
No ratings yet
Question Bank For MID-1
2 pages
Cartoonify Image
No ratings yet
Cartoonify Image
12 pages
Unit 2 New
No ratings yet
Unit 2 New
21 pages
Long Short Term Memory Networks - Architecture of LSTM
No ratings yet
Long Short Term Memory Networks - Architecture of LSTM
14 pages
Software Engineering Practices Overview
No ratings yet
Software Engineering Practices Overview
9 pages
Me2142 Tutorial 5
No ratings yet
Me2142 Tutorial 5
23 pages
Lec1 - Introduction To Control System
No ratings yet
Lec1 - Introduction To Control System
25 pages
Design of Feedforward-Feedback Controller For Reactive Distillation Column Having Inverse Response
No ratings yet
Design of Feedforward-Feedback Controller For Reactive Distillation Column Having Inverse Response
6 pages
Pinnacle Case Analysis
50% (2)
Pinnacle Case Analysis
3 pages
Project Requirements: Requirement List
No ratings yet
Project Requirements: Requirement List
7 pages
AI vs ML: Key Differences Explained
No ratings yet
AI vs ML: Key Differences Explained
16 pages
EEF467 Tutorial Sheet 3
No ratings yet
EEF467 Tutorial Sheet 3
4 pages
Quiz 2 Solution
No ratings yet
Quiz 2 Solution
2 pages
ECE661 Artificial Intelligence and Fuzzy Systems Syllabus
100% (1)
ECE661 Artificial Intelligence and Fuzzy Systems Syllabus
4 pages
Inventory Management
No ratings yet
Inventory Management
45 pages
Dynamic Promming
No ratings yet
Dynamic Promming
5 pages
Ai Studio Overview Guide
No ratings yet
Ai Studio Overview Guide
10 pages
Adaptive Control Course Overview
No ratings yet
Adaptive Control Course Overview
2 pages
Rr411005 Digital Control Systems
No ratings yet
Rr411005 Digital Control Systems
8 pages
Exam Content Manual Preview
No ratings yet
Exam Content Manual Preview
6 pages

B24 ML Exp-3

Uploaded by

B24 ML Exp-3

Uploaded by

PART A

(PART A: TO BE REFFERED BY STUDENTS)

The objective of the support vector machine algorithm is to find a hyperplane in an N-

SVM can be of two types:

Pros and Cons of SVM Classifiers

Pros of SVM classifiers

Roll. No. B24 Name:Sakshi Bhaskar Tupsundar

df = pd.read_csv('/content/survey lung cancer.csv')

from sklearn import preprocessing

# label_encoder object knows

# Encode labels in column 'species'.

import matplotlib.pyplot as plt

# Step 1: Calculate the first quartile (Q1), third quartile (Q3),

# Step 3: Identify outliers for each column

# For example, if you choose to remove the outliers:

Applying SVM model before outlier removal

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

from sklearn.preprocessing import StandardScaler

accuracy = accuracy_score(y_test, y_pred)

Applying SVM model after outlier removal

X = data.drop('LUNG_CANCER', axis=1) # Adjust as needed

# Initialize an empty list to store selected features

# Split the data into training and testing sets

while len(selected_features) < X.shape[1]: # Repeat until all features

for feature in X.columns:

# Train an SVM classifier on the current feature set

# Make predictions on the test set

# Check if this feature improves accuracy

# Add the best feature to the selected features list

#Print the selected feature and its accuracy

print("Forward selection complete.")

Applying SVM model sfter feature selection Process

# Create an SVM model with the 'rbf' kernel

# Fit the SVM model to the training data

# Make predictions on the test data

# Calculate accuracy on the test set

# Perform cross-validation and print the cross-validation scores

# Assuming the columns are named 'feature1', 'feature2', and 'target'

# Split the data into training and testing sets

# Define the model

# Define the hyperparameter distributions to sample from

# Handle None for max_depth separately

# Perform randomized search with cross-validation

# Get the best hyperparameters

B.2 Input and Output:

SVM Model Scores

SVM Model(Feature Slection) Scores

Hyper Parameter tuning for Scores

B.5 Question of Curiosity

You might also like