0% found this document useful (0 votes)
12 views58 pages

ML Lab

The document outlines the vision and mission of the Rajamahendri Institute of Engineering & Technology and its Computer Science and Engineering Department, emphasizing excellence in education and research. It details a Machine Learning lab manual with various experiments designed to teach students about machine learning algorithms using Python. The lab includes practical exercises such as implementing algorithms like FIND-S, Candidate Elimination, and decision trees, along with data handling and analysis techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views58 pages

ML Lab

The document outlines the vision and mission of the Rajamahendri Institute of Engineering & Technology and its Computer Science and Engineering Department, emphasizing excellence in education and research. It details a Machine Learning lab manual with various experiments designed to teach students about machine learning algorithms using Python. The lab includes practical exercises such as implementing algorithms like FIND-S, Candidate Elimination, and decision trees, along with data handling and analysis techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

RAJAMAHENDRI

INSTITUTEOFENGINEERING&TECHNOLOGY
(ApprovedbyAICTE,New Delhi,Affiliatedto JNTUK,Kakinada,AccreditedBYNAAC)
BHOOPALAPATNAM,RAJAMAHENDRAVARAM,E.G.Dist.,AP,533107.
eMail: office@rietrjy.co.in Website:www.rietrjy.co.in Ph:+919121214413

VISION AND MISSION OF THE COLLEGE

VISION
The vision of the college “To develop RIET into an Institution
Excellence in Engineering Education at Graduate level, post graduate
level to carry out quality research in Engineering and Technology”.

MISSION
➢ To educate student with a practical approach to dovetail them
to industry needs.
➢ To govern the institution with a proactive and professional
management with passionate teaching faculty.
➢ To provide holistic and integrated education and achieve overall
development of the student by imparting scientific and
technical, social and cognitive, managerial and organization
skills.
➢ To complete with best and be the most preferred institution of
the studious and the scholarly.
RAJAMAHENDRI
INSTITUTE OF ENGINEERING & TECHNOLOGY
(Approved by AICTE, New Delhi, Affiliated to JNTUK, Kakinada, Accredited BY NAAC)
BHOOPALAPATNAM, RAJAMAHENDRAVARAM, E.G. Dist., AP, 533107.
eMail: office@rietrjy.co.in Website: www.rietrjy.co.in Ph: +91 91212 14413

Department
of
Computer Science and Engineering

Vision
The vision of the Computer Science and Engineering Department is to
become a nationally and internationally leading institution of higher learning, building
upon the culture and the values of universal science and contemporary education, and a
centre of research and education generating the knowledge and the technologies which
lay the groundwork in shaping the future in the fields of Computer Science Engineering
and contribute to the needs of the society.

Mission
To provide the technical knowledge and soft skills required to succeed in life,
career and help society to achieve self sufficiency

The CSE Department is committed

➢ To achieve excellence in higher education and research through dissemination


of quality technical education with a strong foundation.
➢ To continuously scout for and build opportunities in the field of IT and sustain
long-term interaction with the institute and industry.
➢ To build and uphold high professional and ethical standards to make the nation
noted for its progressive contribution to global society.
Department of Computer Science and Engineering
Machine Learning using Python Lab
(Common for III–I CSE(DS), CSE(AI&ML), III–II CSE & IV–I EEE)
Course Outcomes (COs): At the end of the course, student will be able to
 Implement procedures for the machine learning algorithms
 Design and Develop Python programs for various Learning algorithms
 Apply appropriate data sets to the Machine Learning algorithms
 Develop Machine Learning algorithms to solve real world problems

List of Experiments:

Page
S.NO List of Experiments: No

Experiment-1:
Implement and demonstrate the FIND-S algorithm for finding the most specific
1
hypothesis based on a given set of training data samples. Read the training data from
a .CSV file.

Experiment-2:
For a given set of training data examples stored in a .CSV file, implement and
2
demonstrate the Candidate Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3
3
algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.
Experiment-4:
Exercises to solve the real-world problems using the following machine learning
4
methods:
a) Linear Regression b) Logistic Regression c) Binary Classifier

Experiment-5:
5
Develop a program for Bias, Variance, Remove duplicates, Cross Validation

Experiment-6:
6
Write a program to implement Categorical Encoding, One-hot Encoding
Experiment-7:
7 Build an Artificial Neural Network by implementing the Back propagation algorithm
and test the same using appropriate data sets.
Experiment-8:
8 Write a program to implement k-Nearest Neighbor algorithm to classify the iris data
set. Print both correct and wrong predictions.
Experiment-9:
9 Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

Experiment-10:
Assuming a set of documents that need to be classified, use the naïve Bayesian
10
Classifier model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.

Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
11 clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library
classes/API in the program.
Experiment-12:
12
Exploratory Data Analysis for Classification using Pandas or Matplotlib.
Experiment-13:
Write a Python program to construct a Bayesian network considering medical data.
13
Use this model to demonstrate the diagnosis of heart patients using standard Heart
Disease Data Set

Experiment-14:
14 Write a program to Implement Support Vector Machines and Principal Component
Analysis

Experiment-15:
15
Write a program to Implement Principal Component Analysis

Add on Programs
Experiment-16:
16
Implementing Candidate Elimination algorithm using python
Experiment-17:
17
Implement K-Means Clustering using python.
ML LAB MANUAL

Experiment-1:

Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.

Dataset:

sky airtemp humidity wind water forcast enjoysport


sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes

import csv

a = []

with open('program.csv', 'r') as csvfile:

for row in csv.reader(csvfile):

a.append(row

) print(a)

print("\n The total number of training instances are : ",len(a))

num_attribute = len(a[0])-1

print("\n The initial hypothesis is

: ") hypothesis =

['0']*num_attribute

print(hypothesis)

for i in range(0, len(a)):

if a[i][num_attribute] == 'yes':

for j in range(0, num_attribute):

if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:


ML LAB MANUAL

hypothesis[j] =

a[i][j] else:

hypothesis[j] = '?'

print("\n The hypothesis for the training instance {} is : \n"

.format(i+1),hypothesis) print("\n The Maximally specific hypothesis for the

training instance is ") print(hypothesis)

OUTPUT :

[['sky', 'airtemp', 'humidity', 'wind', 'water', 'forcast', 'enjoysport']] The total number of training
instances are : 1

The initial hypothesis is : ['0', '0', '0', '0', '0', '0'] [['sky',
'airtemp', 'humidity', 'wind', 'water', 'forcast', 'enjoysport'],
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']] The total number of

training instances are : 2

The initial hypothesis is : ['0', '0', '0', '0', '0', '0']

The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same'] [['sky', 'airtemp',
'humidity', 'wind', 'water', 'forcast', 'enjoysport'], ['sunny', 'warm',
'normal', 'strong', 'warm', 'same', 'yes'], ['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']]

The total number of training instances are : 3

The initial hypothesis is : ['0', '0', '0', '0', '0', '0']

The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same']

The hypothesis for the training instance 3 is : ['sunny', 'warm', '?', 'strong', 'warm', 'same']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong',
'warm', 'same'] [['sky', 'airtemp', 'humidity',
'wind', 'water', 'forcast', 'enjoysport'], ['sunny', 'warm', 'normal',
ML LAB MANUAL

'strong', 'warm', 'same', 'yes'], ['sunny', 'warm', 'high', 'strong',


'warm', 'same', 'yes'], ['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']]

The total number of training instances are : 4

The initial hypothesis is : ['0', '0', '0', '0', '0', '0']

The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same']

The hypothesis for the training instance 3 is : ['sunny', 'warm', '?', 'strong', 'warm', 'same']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong',
'warm', 'same'] [['sky', 'airtemp', 'humidity',
'wind', 'water', 'forcast', 'enjoysport'], ['sunny', 'warm', 'normal',
'strong', 'warm', 'same', 'yes'], ['sunny', 'warm', 'high', 'strong',
'warm', 'same', 'yes'], ['rainy', 'cold', 'high', 'strong', 'warm',
'change', 'no'], ['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']]

The total number of training instances are : 5

The initial hypothesis is : ['0', '0', '0', '0', '0', '0']

The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same']

The hypothesis for the training instance 3 is : ['sunny', 'warm', '?', 'strong', 'warm', 'same']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong',
'warm', 'same']

The hypothesis for the training instance 5 is : ['sunny', 'warm', '?', 'strong', '?', '?']

The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong', '?', '?']
ML LAB MANUAL

Experiment-2:

For a given set of training data examples stored in a .CSV file, implement and demonstrate
the CandidateElimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.

DataSet:

Example Sky AirTemp Humidity Wind Water Forecast


EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool

Change Yes import numpy as np

import pandas as pd

data = pd.DataFrame(data=pd.read_csv('program.csv'))

concepts = np.array(data.iloc[:,0:-1])

print(concepts)

target = np.array(data.iloc[:,-1])

print(target)

def learn(concepts, target):

specific_h = concepts[0].copy()

print("initialization of specific_h and

general_h") print(specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]

print(general_h)

for i, h in

enumerate(concepts): if

target[i] == "yes":
ML LAB MANUAL

for x in

range(len(specific_h)): if

h[x]!= specific_h[x]:

specific_h[x] ='?'

general_h[x][x] ='?'

print(specific_h)

print(specific_h)

if target[i] == "no":

for x in range(len(specific_h)):

if h[x]!= specific_h[x]:

general_h[x][x] =

specific_h[x]

else:

general_h[x][x] = '?'

print(" steps of Candidate Elimination

Algorithm",i+1) print(specific_h)

print(general_h)

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?',

'?']] for i in indices:

general_h.remove(['?', '?', '?', '?', '?',

'?']) return specific_h, general_h

s_final, g_final.learn(concepts, target)

print("Final Specific_h:", s_final,

sep="\n") print("Final General_h:",

g_final, sep="\n")
ML LAB MANUAL

output:

[['sunny' 'warm' 'normal' 'strong' 'warm' 'same']


['sunny' 'warm' 'high' 'strong' 'warm' 'same']
['rainy' 'cold' 'high' 'strong' 'warm' 'change']
['sunny' 'warm' 'high' 'strong' 'cool' 'change']] ['yes' 'yes' 'no' 'yes']
ML LAB MANUAL

Experiment-3:

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.

Dataset:

Outlook Temperature Humidity Wind Answe


r
sunny hot high weak no
sunny hot high strong no
overcast hot high weak yes
rain mild high weak yes
rain cool normal weak yes
rain cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rain mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rain mild high strong no

import

math

import csv

def load_csv(id3):

lines=csv.reader(open(id3,"r

")) dataset=list(lines)

headers=dataset.pop(0)

return dataset,headers

class Node:
ML LAB MANUAL

def init (self,attribute):

self.attribute=attribute

self.children=[]

self.answer=""

def subtables(data,col,delete):

dic={}

coldata=[row[col] for row in data]

attr=list(set(coldata))

counts=[0]*len(attr)

r=len(data)

c=len(data[0])

for x in range(len(attr)):

for y in range(r):

if

data[y][col]==attr[

x]: counts[x]+=1

for x in range(len(attr)):

dic[attr[x]]=[[0 for i in range(c)] for j in

range(counts[x])] pos=0

for y in range(r):

if

data[y][col]==attr[

x]: if delete:

del data[y][col]

dic[attr[x]][pos]=data[y]

pos+=1

return attr,dic
ML LAB MANUAL

def entropy(S):

attr=list(set(S)

) if

len(attr)==1:

return 0

counts=[0,0]

for i in range(2):

counts[i]=sum([1 for x in S if

attr[i]==x])/(len(S)*1.0) sums=0

for cnt in counts:

sums+=-1*cnt*math.log(cnt,2)

return sums

def compute_gain(data,col):

attr,dic=subtables(data,col,delete=Fal

se) total_size=len(data)

entropies=[0]*len(attr)

ratio=[0]*len(attr)

total_entropy=entropy([row[-1] for row in

data]) for x in range(len(attr)):

ratio[x]=len(dic[attr[x]])/(total_size*1.0)

entropies[x]=entropy([row[-1] for row in

dic[attr[x]]]) total_entropy-=ratio[x]*entropies[x]

return total_entropy

def build_tree(data,features):

lastcol=[row[-1] for row in

data] if(len(set(lastcol)))==1:
ML LAB MANUAL

node=Node("")

node.answer=lastcol[

0] return node

n=len(data[0])-1

gains=[0]*n

for col in range(n):

gains[col]=compute_gain(data,c

ol)

split=gains.index(max(gains))

node=Node(features[split])

fea=features[:split]+features[split+1:]

attr,dic=subtables(data,split,delete=True

) for x in range(len(attr)):

child=build_tree(dic[attr[x]],fea)

node.children.append((attr[x],child))

return node

def

print_tree(node,level):

if node.answer!="":

print(" "*level,node.answer)

return

print(" "*level,node.attribute)

for value,n in node.children:

print(" "*(level+1),value)

print_tree(n,level+2)

def

classify(node,x_test,features)
ML LAB MANUAL

: if node.answer!="":
ML LAB MANUAL

print(node.answer)

return

pos=features.index(node.attribute)

for value, n in node.children:

if x_test[pos]==value:

classify(n,x_test,features)

'''Main Program'''

dataset,features=load_csv("id3.csv")

node1=build_tree(dataset,features)

print("The decision tree for the dataset using ID3 algorithm

is:") print_tree(node1,0)

testdata,features=load_csv("id3_test_1.csv")

for x_test in testdata:

print("The test instance:",x_test)

print("The label for test

instance:",end="")

classify(node1,x_test,features)
ML LAB MANUAL

output:

The decision tree for the dataset using ID3 algorithm is:
Outlook
rain
Wind
weak
yes
strong
no
overcast
yes
sunny
Humidity
normal
yes
high
no
The test instance: ['rain', 'cool', 'normal', 'strong'] The label for test
instance:no
The test instance: ['sunny', 'mild', 'normal', 'strong'] The label for test
instance:yes
ML LAB MANUAL

Experiment-4:

Exercises to solve the real-world problems using the following machine learning methods:

a) Linear Regression
import matplotlib.pyplot as plt

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err =

stats.linregress(x, y) def myfunc(x):

return slope * x + intercept

mymodel = list(map(myfunc,

x)) plt.scatter(x,y)

plt.plot(x,mymodel)

plt.show()

output:
ML LAB MANUAL
ML LAB MANUAL

b) Logistic Regression
import numpy

from sklearn import linear_model

#Reshaped for Logistic function.

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37,

4.96, 4.52, 3.69, 5.88]).reshape(-1,1)

y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

print(X)

print(y)

logr = linear_model.LogisticRegression()

print(logr.fit(X,y))

#predict if tumor is cancerous where the size is 3.46mm:

predicted = logr.predict(numpy.array([3.46]).reshape(-

1,1)) print(predicted)

output:

[[3.78]
[2.44]
[2.09]
[0.14]
[1.72]
[1.65]
[4.92]
[4.37]
[4.96]
[4.52]
[3.69]
[5.88]]
[0 0 0 0 0 0 1 1 1 1 1 1]
LogisticRegression() [0]
ML LAB MANUAL

c) Binary Classifier
import pandas as pd

# create a binary classification data

frame df = pd.DataFrame({

'actual': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0], # actual values

'predicted': [1, 0, 1, 0, 0, 1, 1, 0, 1, 0] # predicted values

})

# calculate accuracy

accuracy = sum(df['actual'] == df['predicted']) / len(df)

print(df)

print("Accuracy:", accuracy)

output:

actual predicted
0 1 1
1 0 0
2 1 1
3 0 0
4 1 0
5 0 1
6 1 1
7 0 0
8 1 1
9 0 0
Accuracy: 0.8
In [ ]:
ML LAB MANUAL

Experiment-5:

Develop a program for Bias, Variance, Remove duplicates , Cross Validation

import numpy as np

import pandas as pd

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split,

cross_val_score from sklearn.metrics import mean_squared_error

# create sample data

np.random.seed(0)

n_samples = 100

X = np.sort(5 * np.random.rand(n_samples, 1), axis=0)

y = np.sin(X).ravel() + np.random.randn(n_samples)

* 0.1 df = pd.DataFrame({'X': X.ravel(), 'y': y})

# split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(df['X'], df['y'],test_size=0.2,

random_state=0) # fit linear regression model

model = LinearRegression()

model.fit(X_train.values.reshape(-1, 1),

y_train) # calculate bias and variance

y_train_pred = model.predict(X_train.values.reshape(-1,

1)) y_test_pred = model.predict(X_test.values.reshape(-

1, 1)) bias = np.mean((y_train - np.mean(y_train_pred))

** 2) variance = np.mean(np.var(y_train_pred))

# remove duplicates from

data df =

df.drop_duplicates()
ML LAB MANUAL

# perform k-fold cross

validation k = 5

scores = cross_val_score(model, df['X'].values.reshape(-1, 1),


df['y'], cv=k,scoring='neg_mean_squared_error')

rmse_scores = np.sqrt(-scores)

mean_rmse =

np.mean(rmse_scores) std_rmse =

np.std(rmse_scores) print("Bias:",

bias)

print("Variance:", variance)

print("Mean RMSE ({}-fold cross validation):".format(k), mean_rmse)

print("Standard deviation of RMSE ({}-fold cross

validation):".format(k),std_rmse) output:

Bias: 0.4612063387307444
Variance: 0.27606796940662237
Mean RMSE (5-fold cross validation): 0.6833215401416302
Standard deviation of RMSE (5-fold cross validation): 0.3387472618905365
ML LAB MANUAL

Experiment-6:

Write a program to implement Categorical Encoding, One-hot Encoding

import pandas as pd

from sklearn.preprocessing import LabelEncoder,

OneHotEncoder # Create a sample dataset with categorical

variables

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male'],

'Color': ['Red', 'Blue', 'Green', 'Red', 'Blue', 'Red', 'Green'],

'Size': ['S', 'M', 'M', 'L', 'S', 'S', 'L']}

df = pd.DataFrame(data)

print("Original dataset:\n", df)

# Categorical encoding using

LabelEncoder le = LabelEncoder()

df['Gender'] =

le.fit_transform(df['Gender']) df['Color']

= le.fit_transform(df['Color']) df['Size'] =

le.fit_transform(df['Size'])

print("\nCategorical encoding using

LabelEncoder:\n", df) # One-hot encoding using

pandas

df = pd.get_dummies(df, columns=['Color',

'Size']) print("\nOne-hot encoding using

pandas:\n", df) # One-hot encoding using

scikit-learn

ohe = OneHotEncoder()

X = ohe.fit_transform(df).toarray()
ML LAB MANUAL

df = pd.DataFrame(X,

columns=ohe.get_feature_names_out()) print("\nOne-hot

encoding using scikit-learn:\n", df)


ML LAB MANUAL

output:

Original dataset: Gender


Color Size
0 Male Red S
1 Female Blue M
2 Male Green M
3 Female Red L
4 Male Blue S
5 Female Red S
6 Male Green L

Categorical encoding using LabelEncoder:

Gender Color Size


0 1 2 2
1 0 0 1
2 1 1 1
3 0 2 0
4 1 0 2
5 0 2 2
6 1 1 0

One-hot encoding using pandas:


Gender Color_0 Color_1 Color_2 Size_0 Size_1 Size_2
0 1 0 0 1 0 0 1
1 0 1 0 0 0 1 0
2 1 0 1 0 0 1 0
3 0 0 0 1 1 0 0
4 1 1 0 0 0 0 1
5 0 0 0 1 0 0 1
6 1 0 1 0 1 0 0
ML LAB MANUAL

One-hot encoding using scikit-learn:

Gender_0 Gender_1 Color_0_0 Color_0_1 Color_1_0 Color_1_1 Color_2_0


\
0 0.0 1.0 1.0 0.0 1.0 0.0 0.0

1 1.0 0.0 0.0 1.0 1.0 0.0 1.0

2 0.0 1.0 1.0 0.0 0.0 1.0 1.0

3 1.0 0.0 1.0 0.0 1.0 0.0 0.0

4 0.0 1.0 0.0 1.0 1.0 0.0 1.0

5 1.0 0.0 1.0 0.0 1.0 0.0 0.0

6 0.0 1.0 1.0 0.0 0.0 1.0 1.0

Color_2_1 Size_0_0 Size_0_1 Size_1_0 Size_1_1 Size_2_0 Size_2_1


0 1.0 1.0 0.0 1.0 0.0 0.0 1.0
1 0.0 1.0 0.0 0.0 1.0 1.0 0.0
2 0.0 1.0 0.0 0.0 1.0 1.0 0.0
3 1.0 0.0 1.0 1.0 0.0 1.0 0.0
4 0.0 1.0 0.0 1.0 0.0 0.0 1.0
5 1.0 1.0 0.0 1.0 0.0 0.0 1.0
6 0.0 0.0 1.0 1.0 0.0 1.0 0.0
ML LAB MANUAL

Experiment-7:

Build an Artificial Neural Network by implementing the Back propagation algorithm and
test the same using appropriate data sets.

import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]),

dtype=float) X = X/np.amax(X,axis=0)

y = y/100

def sigmoid (x):

return 1/(1 + np.exp(-

x)) def

derivatives_sigmoid(x):

return x*(1 -

x) epoch=5000

lr=0.1

inputlayer_neurons = 2

hiddenlayer_neurons = 3

output_neurons = 1

wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))

bh=np.random.uniform(size=(1,hiddenlayer_neurons))

wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))

bout=np.random.uniform(size=(1,output_neurons))

for i in range(epoch):

hinp1=np.dot(X,wh)

hinp=hinp1 + bh
ML LAB MANUAL

hlayer_act = sigmoid(hinp)

outinp1=np.dot(hlayer_act,wout)

outinp= outinp1+ bout

output = sigmoid(outinp)

EO = y-output

outgrad = derivatives_sigmoid(output)

d_output = EO* outgrad

EH = d_output.dot(wout.T)

hiddengrad = derivatives_sigmoid(hlayer_act)

d_hiddenlayer = EH * hiddengrad

wout +=

hlayer_act.T.dot(d_output)*lr wh +=

X.T.dot(d_hiddenlayer)*lr

print("Input: \n" + str(X))

print("Actual Output: \n" + str(y))

print("Predicted Output:\n",output)
ML LAB MANUAL

output:

Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output: [[0.92]
[0.86]
[0.89]]
Predicted Output: [[0.89503163]
[0.88037369]
[0.89433984]]
ML LAB MANUAL

Experiment-8:

Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set.
Print both correct and wrong predictions.

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import classification_report,

confusion_matrix from sklearn import datasets

""" Iris Plants Dataset, dataset contains 150 (50 in each of

three classes)Number of Attributes: 4 numeric, predictive

attributes and the Class

"""

iris=datasets.load_iris()

""" The x variable contains the first four columns of the

dataset (i.e. attributes) while y contains the labels.

"""

x = iris.data

y=

iris.target

print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')

print(x)

print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-

Virginica') print(y)

""" Splits the dataset into 70% train data and 30% test data.

This means that out of total 150 records, the training set

will contain 105 records and the test set contains 45 of

those records

"""
ML LAB MANUAL

x_train, x_test, y_train, y_test =

train_test_split(x,y,test_size=0.3) #To Training the model

and Nearest nighbors K=5

classifier = KNeighborsClassifier(n_neighbors=5)

classifier.fit(x_train, y_train)

#to make predictions on our test

data

y_pred=classifier.predict(x_test)

""" For evaluating an algorithm, confusion matrix, precision,

recall and f1 score are the most commonly used metrics.

"""

print('Confusion Matrix')

print(confusion_matrix(y_test,y_pred))

print('Accuracy Metrics')

print(classification_report(y_test,y_pred))

output:

sepal-length sepal-width petal-length petal-width


[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2]
[4.8 3. 1.4 0.1]
[4.3 3. 1.1 0.1]
[5.8 4. 1.2 0.2]
[5.7 4.4 1.5 0.4]
[5.4 3.9 1.3 0.4]
[5.1 3.5 1.4 0.3]
[5.7 3.8 1.7 0.3]
[5.1 3.8 1.5 0.3]
[5.4 3.4 1.7 0.2]
[5.1 3.7 1.5 0.4]
ML LAB MANUAL
[4.6 3.6 1. 0.2]
ML LAB MANUAL

[5.1 3.3 1.7 0.5]


[4.8 3.4 1.9 0.2]
[5. 3. 1.6 0.2]
[5. 3.4 1.6 0.4]
[5.2 3.5 1.5 0.2]
[5.2 3.4 1.4 0.2]
[4.7 3.2 1.6 0.2]
[4.8 3.1 1.6 0.2]
[5.4 3.4 1.5 0.4]
[5.2 4.1 1.5 0.1]
[5.5 4.2 1.4 0.2]
[4.9 3.1 1.5 0.2]
[5. 3.2 1.2 0.2]
[5.5 3.5 1.3 0.2]
[4.9 3.6 1.4 0.1]
[4.4 3. 1.3 0.2]
[5.1 3.4 1.5 0.2]
[5. 3.5 1.3 0.3]
[4.5 2.3 1.3 0.3]
[4.4 3.2 1.3 0.2]
[5. 3.5 1.6 0.6]
[5.1 3.8 1.9 0.4]
[4.8 3. 1.4 0.3]
[5.1 3.8 1.6 0.2]
[4.6 3.2 1.4 0.2]
[5.3 3.7 1.5 0.2]
[5. 3.3 1.4 0.2]
[7. 3.2 4.7 1.4]
[6.4 3.2 4.5 1.5]
[6.9 3.1 4.9 1.5]
[5.5 2.3 4. 1.3]
[6.5 2.8 4.6 1.5]
[5.7 2.8 4.5 1.3]
[6.3 3.3 4.7 1.6]
[4.9 2.4 3.3 1. ]
[6.6 2.9 4.6 1.3]
[5.2 2.7 3.9 1.4]
[5. 2. 3.5 1. ]
[5.9 3. 4.2 1.5]
[6. 2.2 4. 1. ]
[6.1 2.9 4.7 1.4]
[5.6 2.9 3.6 1.3]
[6.7 3.1 4.4 1.4]
[5.6 3. 4.5 1.5]
[5.8 2.7 4.1 1. ]
[6.2 2.2 4.5 1.5]
[5.6 2.5 3.9 1.1]
[5.9 3.2 4.8 1.8]
[6.1 2.8 4. 1.3]
[6.3 2.5 4.9 1.5]
[6.1 2.8 4.7 1.2]
[6.4 2.9 4.3 1.3]
[6.6 3. 4.4 1.4]
[6.8 2.8 4.8 1.4]
ML LAB MANUAL

[6.7 3. 5. 1.7]
[6. 2.9 4.5 1.5]
[5.7 2.6 3.5 1. ]
[5.5 2.4 3.8 1.1]
[5.5 2.4 3.7 1. ]
[5.8 2.7 3.9 1.2]
[6. 2.7 5.1 1.6]
[5.4 3. 4.5 1.5]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[6.3 2.3 4.4 1.3]
[5.6 3. 4.1 1.3]
[5.5 2.5 4. 1.3]
[5.5 2.6 4.4 1.2]
[6.1 3. 4.6 1.4]
[5.8 2.6 4. 1.2]
ML LAB MANUAL

Experiment-9:

Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs

import numpy as np

from bokeh.plotting import figure, show,

output_notebook from bokeh.layouts import gridplot

from bokeh.io import

push_notebook def

local_regression(x0, X, Y, tau):

# add bias term

x0 = np.r_[1, x0] # Add one to avoid the loss in

information X = np.c_[np.ones(len(X)), X]

# fit model: normal equations with kernel

xw = X.T * radial_kernel(x0, X, tau) # XTranspose * W

beta = np.linalg.pinv(xw @ X) @ xw @ Y #@ MatrixMultiplication or

Dot Product # predict value

return x0 @ beta # @ Matrix Multiplication or Dot Product for

prediction def radial_kernel(x0, X, tau):

return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau *

tau)) # Weight or Radial Kernal Bias Function

n = 1000

# generate dataset

X = np.linspace(-3, 3, num=n)

print("The Data Set ( 10 Samples) X

:\n",X[1:10]) Y = np.log(np.abs(X ** 2 - 1)

+ .5)

print("The Fitting Curve Data Set (10 Samples) Y:\n",Y[1:10])


ML LAB MANUAL

# jitter X

X += np.random.normal(scale=.1, size=n)

print("Normalised (10 Samples) X

:\n",X[1:10]) domain = np.linspace(-3, 3,

num=300)

print(" Xo Domain Space(10 Samples)

:\n",domain[1:10]) def plot_lwr(tau):

# prediction through regression

prediction = [local_regression(x0, X, Y, tau)for x0 in

domain] plot = figure(width=400,height=400)

plot.title.text='tau=%g' % tau

plot.scatter(X, Y, alpha=.3)

plot.line(domain, prediction, line_width=2, color='red')

return plot

show(gridplot([

[plot_lwr(10.),

plot_lwr(1.)],

[plot_lwr(0.1), plot_lwr(0.01)]]))

Output:

The Data Set ( 10 Samples) X :


[-2.99399399 -2.98798799 -2.98198198 -2.97597598 -2.96996997 -2.96396396
-2.95795796 -2.95195195 -2.94594595]
The Fitting Curve Data Set (10 Samples) Y:
[2.13582188 2.13156806 2.12730467 2.12303166 2.11874898 2.11445659
2.11015444 2.10584249 2.10152068]
Normalised (10 Samples) X :
[-3.00701911 -3.10827747 -2.96202547 -2.93665936 -3.06366153 -3.16189882
-2.91341958 -2.97148528 -2.96621759]
Xo Domain Space(10 Samples) :
[-2.97993311 -2.95986622 -2.93979933 -2.91973244 -2.89966555 -2.87959866
-2.85953177 -2.83946488 -2.81939799]
ML LAB MANUAL
ML LAB MANUAL

Experiment-10:

Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.

from sklearn.feature_extraction.text import CountVectorizer


from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.model_selection import train_test_split

# Step 1: Preprocessing
# Assuming you have a list of documents and their corresponding labels
documents = [ "This is a positive document" ,
"This document is negative" ,
"Another positive document" ,
"Another negative document" ]

labels = [ "positive" , "negative" , "positive" , "negative" ]

# Step 2: Feature extraction

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(documents)

# Step 3: Split the dataset

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3,


random_state=42)

# Step 4: Training

classifier = MultinomialNB() classifier.fit(X_train, y_train)

# Step 5: Classification

predictions = classifier.predict(X_test)

# Step 6: Evaluation

accuracy = accuracy_score(y_test, predictions)


precision = precision_score(y_test, predictions, average= 'weighted' )
recall = recall_score(y_test, predictions, average= 'weighted' )

print ( "Accuracy:" , accuracy)


print ( "Precision:" , precision) (
print "Recall:"
, recall)

output: Accuracy:
0.0
Precision: 0.0
ML LAB MANUAL

Recall: 0.0
ML LAB MANUAL

Experiment-11:

Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.

import matplotlib.pyplot as plt

from sklearn import datasets

from sklearn.cluster import

KMeans import sklearn.metrics

as sm import pandas as pd

import numpy as np

iris = datasets.load_iris()

X = pd.DataFrame(iris.data)

X.columns =

['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width'] y =

pd.DataFrame(iris.target)

y.columns = ['Targets']

model = KMeans(n_clusters=3)

model.fit(X)

plt.figure(figsize=(14,7))

colormap = np.array(['red', 'lime',

'black']) # Plot the Original

Classifications plt.subplot(1, 2, 1)

plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets],

s=40) plt.title('Real Classification')

plt.xlabel('Petal Length')
ML LAB MANUAL

plt.ylabel('Petal Width')

# Plot the Models Classifications

plt.subplot(1, 2, 2)

plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_],

s=40) plt.title('K Mean Classification')

plt.xlabel('Petal Length')

plt.ylabel('Petal Width')

print('The accuracy score of K-Mean: ',sm.accuracy_score(y, model.labels_))

print('The Confusion matrixof K-Mean: ',sm.confusion_matrix(y,

model.labels_)) from sklearn import preprocessing

scaler = preprocessing.StandardScaler()

scaler.fit(X)

xsa = scaler.transform(X)

xs = pd.DataFrame(xsa, columns = X.columns)

#xs.sample(5)

from sklearn.mixture import

GaussianMixture gmm =

GaussianMixture(n_components=3)

gmm.fit(xs)

y_gmm =

gmm.predict(xs)

#y_cluster_gmm

plt.subplot(2, 2, 3)

plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_gmm],

s=40) plt.title('GMM Classification')

plt.xlabel('Petal

Length')
ML LAB MANUAL

plt.ylabel('Petal

Width')
ML LAB MANUAL

print('The accuracy score of EM: ',sm.accuracy_score(y, y_gmm))

print('The Confusion matrix of EM: ',sm.confusion_matrix(y,

y_gmm))

output:

The accuracy score of K-Mean: 0.8933333333333333 The


Confusion matrixof K-Mean: [[50 0 0]
[ 0 48 2]
[ 0 14 36]]
The accuracy score of EM: 0.3333333333333333 The
Confusion matrix of EM: [[ 0 50 0]
[45 0 5]
[ 0 0 50]]

In [ ]:
ML LAB MANUAL

Experiment-12:

Exploratory Data Analysis for Classification using Pandas or Matplotlib.

import numpy as np

import pandas as pd

data = pd.DataFrame(data=pd.read_csv('enjoysport

.csv')) concepts = np.array(data.iloc[:,0:-1])

print(concepts)

target = np.array(data.iloc[:,-1])

print(target)

def learn(concepts, target):

specific_h = concepts[0].copy()

print("initialization of specific_h and

general_h") print(specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in

range(len(specific_h))]

print(general_h)

for i, h in

enumerate(concepts): if

target[i] == "yes":

for x in

range(len(specific_h)): if

h[x]!= specific_h[x]:

specific_h[x] ='?'

general_h[x][x] ='?'

print(specific_h)

print(specific_h)

if target[i] == "no":
ML LAB MANUAL

for x in

range(len(specific_h)): if

h[x]!= specific_h[x]:

general_h[x][x] = specific_h[x]

else:

general_h[x][x] = '?'

print(" steps of Candidate Elimination Algorithm",i+1)

print(specific_h)

print(general_h)

indices = [i for i, val in enumerate(general_h) if val ==['?', '?', '?', '?', '?',

'?']] for i in indices:

general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("Final Specific_h:", s_final,

sep="\n") print("Final General_h:",

g_final, sep="\n")
ML LAB MANUAL

output:

[['sunny' 'warm' 'normal' 'strong' 'warm' 'same']


['sunny' 'warm' 'high' 'strong' 'warm' 'same']
['rainy' 'cold' 'high' 'strong' 'warm' 'change']
['sunny' 'warm' 'high' 'strong' 'cool' 'change']] ['yes' 'yes' 'no' 'yes']
initialization of specific_h and general_h ['sunny' 'warm'
'normal' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '
?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
['sunny' 'warm' '?' 'strong' 'warm' 'same']
['sunny' 'warm' '?' 'strong' 'warm' 'same'] steps of
Candidate Elimination Algorithm 3 ['sunny' 'warm' '?'
'strong' 'warm' 'same']
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '? ', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['sunny' 'warm' '?' 'strong' 'warm' 'same'] Final General_h:
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]
In [ ]:
ML LAB MANUAL

Experiment-13:

Write a Python program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set
Program
import bayespy as bp
import numpy as np
import csv
from colorama import init
from colorama import Fore, Back, Style
init()
# Define Parameter Enum values
#Age
ageEnum = {'SuperSeniorCitizen':0, 'SeniorCitizen':1, 'MiddleAged':2, 'Youth':3, 'Teen':4}
# Gender
genderEnum = {'Male':0, 'Female':1}
# FamilyHistory
familyHistoryEnum = {'Yes':0, 'No':1}
# Diet(Calorie Intake)
dietEnum = {'High':0, 'Medium':1, 'Low':2}
# LifeStyle
lifeStyleEnum = {'Athlete':0, 'Active':1, 'Moderate':2, 'Sedetary':3}
# Cholesterol
cholesterolEnum = {'High':0, 'BorderLine':1, 'Normal':2}
# HeartDisease
heartDiseaseEnum = {'Yes':0, 'No':1}
#heart_disease_data.csv
with open('heart_disease_data.csv') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
data = []
for x in dataset:
data.append([ageEnum[x[0]],genderEnum[x[1]],familyHistoryEnum[x[2]],dietEnum[x[3]],lifeStyleEn
um[x[4]],cholesterolEnum[x[5]],heartDiseaseEnum[x[6]]])
# Training data for machine learning todo: should import from csv
data = np.array(data)
N = len(data)
# Input data column assignment
p_age = bp.nodes.Dirichlet(1.0*np.ones(5))
age = bp.nodes.Categorical(p_age, plates=(N,))
age.observe(data[:,0])
p_gender = bp.nodes.Dirichlet(1.0*np.ones(2))
gender = bp.nodes.Categorical(p_gender, plates=(N,))
gender.observe(data[:,1])
p_familyhistory = bp.nodes.Dirichlet(1.0*np.ones(2))
familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))
familyhistory.observe(data[:,2])
ML LAB MANUAL

p_diet = bp.nodes.Dirichlet(1.0*np.ones(3))
diet = bp.nodes.Categorical(p_diet, plates=(N,))
diet.observe(data[:,3])
p_lifestyle = bp.nodes.Dirichlet(1.0*np.ones(4))
lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))
lifestyle.observe(data[:,4])
p_cholesterol = bp.nodes.Dirichlet(1.0*np.ones(3))
cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))
cholesterol.observe(data[:,5])
# Prepare nodes and establish edges
# np.ones(2) -> HeartDisease has 2 options Yes/No
# plates(5, 2, 2, 3, 4, 3) -> corresponds to options present for domain values
p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4, 3))
heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet, lifestyle, cholesterol],
bp.nodes.Categorical, p_heartdisease)
heartdisease.observe(data[:,6])
C:\Anaconda3\lib\site-packages\bayespy\inference\vmp\nodes\categorical.py:107: FutureWarning:
Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead
of `arr[seq]`. In the future this will be interpreted as an array
p_heartdisease.update()
# Sample Test with hardcoded values
#print("Sample Probability")
#print("Probability(HeartDisease|Age=SuperSeniorCitizen, Gender=Female, FamilyHistory=Yes,
DietIntake=Medium, LifeStyle=Sedetary, Cholesterol=High)")
#print(bp.nodes.MultiMixture([ageEnum['SuperSeniorCitizen'], genderEnum['Female'],
familyHistoryEnum['Yes'], dietEnum['Medium'], lifeStyleEnum['Sedetary'], cholesterolEnum['High']],
bp.nodes.Categorical, p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']])
# Interactive Test
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([int(input('Enter Age: ' + str(ageEnum))), int(input('Enter Gender: ' +
str(genderEnum))), int(input('Enter FamilyHistory: ' + str(familyHistoryEnum))), int(input('Enter
dietEnum: ' + str(dietEnum))), int(input('Enter LifeStyle: ' + str(lifeStyleEnum))), int(input('Enter
Cholesterol: ' + str(cholesterolEnum)))], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
#print(Style.RESET_ALL)
m = int(input("Enter for Continue:0, Exit :1 "))
OUTPUT
Enter Age: {'SuperSeniorCitizen': 0, 'SeniorCitizen': 1, 'MiddleAged': 2, 'Youth': 3, 'Teen': 4}1
Enter Gender: {'Male': 0, 'Female': 1}0
Enter FamilyHistory: {'Yes': 0, 'No': 1}0
Enter dietEnum: {'High': 0, 'Medium': 1, 'Low': 2}2
Enter LifeStyle: {'Athlete': 0, 'Active': 1, 'Moderate': 2, 'Sedetary': 3}2
Enter Cholesterol: {'High': 0, 'BorderLine': 1, 'Normal': 2}1
C:\Anaconda3\lib\site-packages\bayespy\inference\vmp\nodes\categorical.py:43: FutureWarning:
Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead
of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will
result either in an error or a different result.
u0[[np.arange(np.size(x)), np.ravel(x)]] = 1
ML LAB MANUAL

Probability(HeartDisease) = 0.5
Enter for Continue:0, Exit :1 1
Experiment -14:
Write a program to implement Support Vector Machines
Aim:
To implement Support Vector Machines
Dataset: haberman.csv- The dataset contains cases from a study that was conducted between 1958
and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had
undergone surgery for breast cancer. The goal is to predict the Survival status (class attribute) of the
patient(1 = the patient survived 5 years or longer,2 = the patient died within 5 years). The data set is
collected from https://archive.ics.uci.edu/ml/datasets/Haberman's+Survival.
Program code:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
data = pd.read_csv(r"E:\sudhakar\haberman.csv", header=None)
#age=age of the patient
#year=Patient's year of operation (year - 1900)
#pos_axil_nodes=Number of positive axillary nodes detected
#survival_status:1 -the patient survived 5 years or longer
# :2 -the patient died within 5 year
col_names=['age','year','pos_axil_nodes','survival_status']
data.columns=col_names
#we removed the attribute year of operation
data=data.drop(['year'], axis=1)
print('The first 5 rows of the data set are:')
print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
class_lbls=data['survival_status'].unique()
class_labels=[]
for x in class_lbls:
class_labels.append(str(x))
print('Class labels are:')
print(class_labels)
sns.countplot(data['survival_status'])
col_names=data.columns
feature_names=col_names[:-1]
ML LAB MANUAL

feature_names=list(feature_names)
print('Feature names are:')
print(feature_names)
x_set = data.drop(['survival_status'], axis=1)
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data['survival_status']
print('First 5 rows of target variable are:')
print(y_set.head())
print('Distribution of Target variable is:')
print(y_set.value_counts())
scaler=StandardScaler()
x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)
scaler.fit(x_train)
x_train=scaler.transform(x_train)
model =SVC()
print("Traning the model with train data set")model.fit(x_train, y_train)
x_test=scaler.transform(x_test)
y_pred=model.predict(x_test)
print('Predicted class labels for test data are:')
print(y_pred)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("Precision:",precision_score(y_test, y_pred))
print("Recall:",recall_score(y_test, y_pred))
print(classification_report(y_test,y_pred,target_names=class_labels))
cm=confusion_matrix(y_test,y_pred)
df_cm = pd.DataFrame(cm, columns=class_labels, index = class_labels)
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30, cmap=plt.cm.Paired)
plt.xlabel('age')
plt.ylabel('pos_axil_nodes')
plt.title('Data points in traning data set')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30, cmap=plt.cm.Paired)
plt.xlabel('age')
plt.ylabel('pos_axil_nodes')
plt.title('support vectors and decision boundary')
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = model.decision_function(xy).reshape(XX.shape)
ax.contour(XX, YY, Z, colors='red', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
ML LAB MANUAL

ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=30,


facecolors='green')
plt.show()
Output screen shots:
ML LAB MANUAL
ML LAB MANUAL

Experiment -15:
Write a program to implement principle component analysis
import numpy as nmp
import matplotlib.pyplot as mpltl
import pandas as pnd
DS = pnd.read_csv('Wine.csv')
# Now, we will distribute the dataset into two components "X" and "Y"
X = DS.iloc[: , 0:13].values
Y = DS.iloc[: , 13].values
from sklearn.model_selection import train_test_split as tts
X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler as SS
SC = SS()
X_train = SC.fit_transform(X_train)
X_test = SC.transform(X_test)
from sklearn.decomposition import PCA
PCa = PCA (n_components = 1)
X_train = PCa.fit_transform(X_train)
X_test = PCa.transform(X_test)
explained_variance = PCa.explained_variance_ratio_
from sklearn.linear_model import LogisticRegression as LR
classifier_1 = LR (random_state = 0)
classifier_1.fit(X_train, Y_train)
Output:
LogisticRegression(random_state=0)
ML LAB MANUAL

Add on Program:

Experiment-16:
Implementing Candidate Elimination algorithm using python
Program:
import numpy as np
# Define a function to check if one hypothesis is more general than another
def is_more_general(h1, h2):
more_general_parts = []
for x, y in zip(h1, h2):
mg = x == '?' or (x != '0' and (x == y or y == '0'))
more_general_parts.append(mg)
return all(more_general_parts)

# Define the Candidate Elimination algorithm


def candidate_elimination(training_data):
# Get the number of attributes
n_attributes = len(training_data[0]) - 1

# Initialize the specific boundary (S) with the most specific hypothesis
specific_hypothesis = ['0'] * n_attributes

# Initialize the general boundary (G) with the most general hypothesis
general_hypothesis = [['?' for _ in range(n_attributes)]]

# Iterate over the training examples


for instance in training_data:
attributes, outcome = instance[:-1], instance[-1]

# If the example is positive


if outcome == 'Yes':
# Generalize S if needed
for i, val in enumerate(attributes):
if specific_hypothesis[i] == '0':
specific_hypothesis[i] = val
elif specific_hypothesis[i] != val:
specific_hypothesis[i] = '?'

# Remove hypotheses from G that are inconsistent with S


general_hypothesis = [g for g in general_hypothesis if is_more_general(g, specific_hypothesis)]

# If the example is negative


else:
# Specialize G if needed
new_general_hypothesis = []
for g in general_hypothesis:
for i in range(n_attributes):
if g[i] == '?':
if specific_hypothesis[i] != attributes[i]:
new_hypothesis = g.copy()
new_hypothesis[i] = specific_hypothesis[i]
ML LAB MANUAL

new_general_hypothesis.append(new_hypothesis)
general_hypothesis = new_general_hypothesis

return specific_hypothesis, general_hypothesis

# Example training data [attributes..., outcome]


# The last value is 'Yes' for positive examples, 'No' for negative examples
training_data = np.array([
['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes'],
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes'],
['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No'],
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']
])

# Run the Candidate Elimination algorithm


specific_hypothesis, general_hypothesis = candidate_elimination(training_data)

# Display the final specific and general hypotheses


print("Final Specific Hypothesis:", specific_hypothesis)
print("Final General Hypotheses:", general_hypothesis)

OUTPUT:
Final Specific Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
Final General Hypotheses: [['Sunny', 'Warm', '?', 'Strong', '?', '?']]

Experiment-17:
Implement K-Means Clustering using python.
Program:

import numpy as np
import matplotlib.pyplot as plt

class KMeans:
def __init__(self, n_clusters=3, max_iter=100):
self.n_clusters = n_clusters
self.max_iter = max_iter
self.centroids = None

def fit(self, X):


# Randomly initialize centroids
random_indices = np.random.choice(X.shape[0], self.n_clusters, replace=False)
self.centroids = X[random_indices]

for _ in range(self.max_iter):
# Assign clusters
distances = self._calculate_distances(X)
clusters = np.argmin(distances, axis=1)

# Calculate new centroids


new_centroids = np.array([X[clusters == i].mean(axis=0) for i in range(self.n_clusters)])
ML LAB MANUAL

# If centroids do not change, break


if np.all(new_centroids == self.centroids):
break

self.centroids = new_centroids

return clusters

def _calculate_distances(self, X):


return np.linalg.norm(X[:, np.newaxis] - self.centroids, axis=2)

def predict(self, X):


distances = self._calculate_distances(X)
return np.argmin(distances, axis=1)

# Generate synthetic data


np.random.seed(42)
X = np.random.rand(100, 2)

# Create and fit K-Means model


kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit(X)

# Plotting the results


plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis')
plt.scatter(kmeans.centroids[:, 0], kmeans.centroids[:, 1], color='red', marker='X', s=200, label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

OUTPUT:
ML LAB MANUAL

You might also like