ML Lab
ML Lab
INSTITUTEOFENGINEERING&TECHNOLOGY
(ApprovedbyAICTE,New Delhi,Affiliatedto JNTUK,Kakinada,AccreditedBYNAAC)
BHOOPALAPATNAM,RAJAMAHENDRAVARAM,E.G.Dist.,AP,533107.
eMail: office@rietrjy.co.in Website:www.rietrjy.co.in Ph:+919121214413
VISION
The vision of the college “To develop RIET into an Institution
Excellence in Engineering Education at Graduate level, post graduate
level to carry out quality research in Engineering and Technology”.
MISSION
➢ To educate student with a practical approach to dovetail them
to industry needs.
➢ To govern the institution with a proactive and professional
management with passionate teaching faculty.
➢ To provide holistic and integrated education and achieve overall
development of the student by imparting scientific and
technical, social and cognitive, managerial and organization
skills.
➢ To complete with best and be the most preferred institution of
the studious and the scholarly.
RAJAMAHENDRI
INSTITUTE OF ENGINEERING & TECHNOLOGY
(Approved by AICTE, New Delhi, Affiliated to JNTUK, Kakinada, Accredited BY NAAC)
BHOOPALAPATNAM, RAJAMAHENDRAVARAM, E.G. Dist., AP, 533107.
eMail: office@rietrjy.co.in Website: www.rietrjy.co.in Ph: +91 91212 14413
Department
of
Computer Science and Engineering
Vision
The vision of the Computer Science and Engineering Department is to
become a nationally and internationally leading institution of higher learning, building
upon the culture and the values of universal science and contemporary education, and a
centre of research and education generating the knowledge and the technologies which
lay the groundwork in shaping the future in the fields of Computer Science Engineering
and contribute to the needs of the society.
Mission
To provide the technical knowledge and soft skills required to succeed in life,
career and help society to achieve self sufficiency
List of Experiments:
Page
S.NO List of Experiments: No
Experiment-1:
Implement and demonstrate the FIND-S algorithm for finding the most specific
1
hypothesis based on a given set of training data samples. Read the training data from
a .CSV file.
Experiment-2:
For a given set of training data examples stored in a .CSV file, implement and
2
demonstrate the Candidate Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3
3
algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.
Experiment-4:
Exercises to solve the real-world problems using the following machine learning
4
methods:
a) Linear Regression b) Logistic Regression c) Binary Classifier
Experiment-5:
5
Develop a program for Bias, Variance, Remove duplicates, Cross Validation
Experiment-6:
6
Write a program to implement Categorical Encoding, One-hot Encoding
Experiment-7:
7 Build an Artificial Neural Network by implementing the Back propagation algorithm
and test the same using appropriate data sets.
Experiment-8:
8 Write a program to implement k-Nearest Neighbor algorithm to classify the iris data
set. Print both correct and wrong predictions.
Experiment-9:
9 Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.
Experiment-10:
Assuming a set of documents that need to be classified, use the naïve Bayesian
10
Classifier model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.
Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
11 clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library
classes/API in the program.
Experiment-12:
12
Exploratory Data Analysis for Classification using Pandas or Matplotlib.
Experiment-13:
Write a Python program to construct a Bayesian network considering medical data.
13
Use this model to demonstrate the diagnosis of heart patients using standard Heart
Disease Data Set
Experiment-14:
14 Write a program to Implement Support Vector Machines and Principal Component
Analysis
Experiment-15:
15
Write a program to Implement Principal Component Analysis
Add on Programs
Experiment-16:
16
Implementing Candidate Elimination algorithm using python
Experiment-17:
17
Implement K-Means Clustering using python.
ML LAB MANUAL
Experiment-1:
Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.
Dataset:
import csv
a = []
a.append(row
) print(a)
num_attribute = len(a[0])-1
: ") hypothesis =
['0']*num_attribute
print(hypothesis)
if a[i][num_attribute] == 'yes':
hypothesis[j] =
a[i][j] else:
hypothesis[j] = '?'
OUTPUT :
[['sky', 'airtemp', 'humidity', 'wind', 'water', 'forcast', 'enjoysport']] The total number of training
instances are : 1
The initial hypothesis is : ['0', '0', '0', '0', '0', '0'] [['sky',
'airtemp', 'humidity', 'wind', 'water', 'forcast', 'enjoysport'],
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']] The total number of
The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same'] [['sky', 'airtemp',
'humidity', 'wind', 'water', 'forcast', 'enjoysport'], ['sunny', 'warm',
'normal', 'strong', 'warm', 'same', 'yes'], ['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']]
The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same']
The hypothesis for the training instance 3 is : ['sunny', 'warm', '?', 'strong', 'warm', 'same']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong',
'warm', 'same'] [['sky', 'airtemp', 'humidity',
'wind', 'water', 'forcast', 'enjoysport'], ['sunny', 'warm', 'normal',
ML LAB MANUAL
The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same']
The hypothesis for the training instance 3 is : ['sunny', 'warm', '?', 'strong', 'warm', 'same']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong',
'warm', 'same'] [['sky', 'airtemp', 'humidity',
'wind', 'water', 'forcast', 'enjoysport'], ['sunny', 'warm', 'normal',
'strong', 'warm', 'same', 'yes'], ['sunny', 'warm', 'high', 'strong',
'warm', 'same', 'yes'], ['rainy', 'cold', 'high', 'strong', 'warm',
'change', 'no'], ['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']]
The hypothesis for the training instance 2 is : ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', 'normal', 'strong',
'warm', 'same']
The hypothesis for the training instance 3 is : ['sunny', 'warm', '?', 'strong', 'warm', 'same']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong',
'warm', 'same']
The hypothesis for the training instance 5 is : ['sunny', 'warm', '?', 'strong', '?', '?']
The Maximally specific hypothesis for the training instance is ['sunny', 'warm', '?', 'strong', '?', '?']
ML LAB MANUAL
Experiment-2:
For a given set of training data examples stored in a .CSV file, implement and demonstrate
the CandidateElimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
DataSet:
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('program.csv'))
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
specific_h = concepts[0].copy()
general_h") print(specific_h)
print(general_h)
for i, h in
enumerate(concepts): if
target[i] == "yes":
ML LAB MANUAL
for x in
range(len(specific_h)): if
h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] =
specific_h[x]
else:
general_h[x][x] = '?'
Algorithm",i+1) print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?',
g_final, sep="\n")
ML LAB MANUAL
output:
Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.
Dataset:
import
math
import csv
def load_csv(id3):
lines=csv.reader(open(id3,"r
")) dataset=list(lines)
headers=dataset.pop(0)
return dataset,headers
class Node:
ML LAB MANUAL
self.attribute=attribute
self.children=[]
self.answer=""
def subtables(data,col,delete):
dic={}
attr=list(set(coldata))
counts=[0]*len(attr)
r=len(data)
c=len(data[0])
for x in range(len(attr)):
for y in range(r):
if
data[y][col]==attr[
x]: counts[x]+=1
for x in range(len(attr)):
range(counts[x])] pos=0
for y in range(r):
if
data[y][col]==attr[
x]: if delete:
del data[y][col]
dic[attr[x]][pos]=data[y]
pos+=1
return attr,dic
ML LAB MANUAL
def entropy(S):
attr=list(set(S)
) if
len(attr)==1:
return 0
counts=[0,0]
for i in range(2):
counts[i]=sum([1 for x in S if
attr[i]==x])/(len(S)*1.0) sums=0
sums+=-1*cnt*math.log(cnt,2)
return sums
def compute_gain(data,col):
attr,dic=subtables(data,col,delete=Fal
se) total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)
ratio[x]=len(dic[attr[x]])/(total_size*1.0)
dic[attr[x]]]) total_entropy-=ratio[x]*entropies[x]
return total_entropy
def build_tree(data,features):
data] if(len(set(lastcol)))==1:
ML LAB MANUAL
node=Node("")
node.answer=lastcol[
0] return node
n=len(data[0])-1
gains=[0]*n
gains[col]=compute_gain(data,c
ol)
split=gains.index(max(gains))
node=Node(features[split])
fea=features[:split]+features[split+1:]
attr,dic=subtables(data,split,delete=True
) for x in range(len(attr)):
child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node
def
print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)
return
print(" "*level,node.attribute)
print(" "*(level+1),value)
print_tree(n,level+2)
def
classify(node,x_test,features)
ML LAB MANUAL
: if node.answer!="":
ML LAB MANUAL
print(node.answer)
return
pos=features.index(node.attribute)
if x_test[pos]==value:
classify(n,x_test,features)
'''Main Program'''
dataset,features=load_csv("id3.csv")
node1=build_tree(dataset,features)
is:") print_tree(node1,0)
testdata,features=load_csv("id3_test_1.csv")
instance:",end="")
classify(node1,x_test,features)
ML LAB MANUAL
output:
The decision tree for the dataset using ID3 algorithm is:
Outlook
rain
Wind
weak
yes
strong
no
overcast
yes
sunny
Humidity
normal
yes
high
no
The test instance: ['rain', 'cool', 'normal', 'strong'] The label for test
instance:no
The test instance: ['sunny', 'mild', 'normal', 'strong'] The label for test
instance:yes
ML LAB MANUAL
Experiment-4:
Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
import matplotlib.pyplot as plt
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
mymodel = list(map(myfunc,
x)) plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()
output:
ML LAB MANUAL
ML LAB MANUAL
b) Logistic Regression
import numpy
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
print(X)
print(y)
logr = linear_model.LogisticRegression()
print(logr.fit(X,y))
predicted = logr.predict(numpy.array([3.46]).reshape(-
1,1)) print(predicted)
output:
[[3.78]
[2.44]
[2.09]
[0.14]
[1.72]
[1.65]
[4.92]
[4.37]
[4.96]
[4.52]
[3.69]
[5.88]]
[0 0 0 0 0 0 1 1 1 1 1 1]
LogisticRegression() [0]
ML LAB MANUAL
c) Binary Classifier
import pandas as pd
frame df = pd.DataFrame({
})
# calculate accuracy
print(df)
print("Accuracy:", accuracy)
output:
actual predicted
0 1 1
1 0 0
2 1 1
3 0 0
4 1 0
5 0 1
6 1 1
7 0 0
8 1 1
9 0 0
Accuracy: 0.8
In [ ]:
ML LAB MANUAL
Experiment-5:
import numpy as np
import pandas as pd
np.random.seed(0)
n_samples = 100
y = np.sin(X).ravel() + np.random.randn(n_samples)
model = LinearRegression()
model.fit(X_train.values.reshape(-1, 1),
y_train_pred = model.predict(X_train.values.reshape(-1,
** 2) variance = np.mean(np.var(y_train_pred))
data df =
df.drop_duplicates()
ML LAB MANUAL
validation k = 5
rmse_scores = np.sqrt(-scores)
mean_rmse =
np.mean(rmse_scores) std_rmse =
np.std(rmse_scores) print("Bias:",
bias)
print("Variance:", variance)
validation):".format(k),std_rmse) output:
Bias: 0.4612063387307444
Variance: 0.27606796940662237
Mean RMSE (5-fold cross validation): 0.6833215401416302
Standard deviation of RMSE (5-fold cross validation): 0.3387472618905365
ML LAB MANUAL
Experiment-6:
import pandas as pd
variables
df = pd.DataFrame(data)
LabelEncoder le = LabelEncoder()
df['Gender'] =
le.fit_transform(df['Gender']) df['Color']
= le.fit_transform(df['Color']) df['Size'] =
le.fit_transform(df['Size'])
pandas
df = pd.get_dummies(df, columns=['Color',
scikit-learn
ohe = OneHotEncoder()
X = ohe.fit_transform(df).toarray()
ML LAB MANUAL
df = pd.DataFrame(X,
columns=ohe.get_feature_names_out()) print("\nOne-hot
output:
Experiment-7:
Build an Artificial Neural Network by implementing the Back propagation algorithm and
test the same using appropriate data sets.
import numpy as np
dtype=float) X = X/np.amax(X,axis=0)
y = y/100
x)) def
derivatives_sigmoid(x):
return x*(1 -
x) epoch=5000
lr=0.1
inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
ML LAB MANUAL
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
output = sigmoid(outinp)
EO = y-output
outgrad = derivatives_sigmoid(output)
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wout +=
hlayer_act.T.dot(d_output)*lr wh +=
X.T.dot(d_hiddenlayer)*lr
print("Predicted Output:\n",output)
ML LAB MANUAL
output:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output: [[0.92]
[0.86]
[0.89]]
Predicted Output: [[0.89503163]
[0.88037369]
[0.89433984]]
ML LAB MANUAL
Experiment-8:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set.
Print both correct and wrong predictions.
"""
iris=datasets.load_iris()
"""
x = iris.data
y=
iris.target
print(x)
Virginica') print(y)
""" Splits the dataset into 70% train data and 30% test data.
This means that out of total 150 records, the training set
those records
"""
ML LAB MANUAL
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)
data
y_pred=classifier.predict(x_test)
"""
print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
output:
[6.7 3. 5. 1.7]
[6. 2.9 4.5 1.5]
[5.7 2.6 3.5 1. ]
[5.5 2.4 3.8 1.1]
[5.5 2.4 3.7 1. ]
[5.8 2.7 3.9 1.2]
[6. 2.7 5.1 1.6]
[5.4 3. 4.5 1.5]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[6.3 2.3 4.4 1.3]
[5.6 3. 4.1 1.3]
[5.5 2.5 4. 1.3]
[5.5 2.6 4.4 1.2]
[6.1 3. 4.6 1.4]
[5.8 2.6 4. 1.2]
ML LAB MANUAL
Experiment-9:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs
import numpy as np
push_notebook def
local_regression(x0, X, Y, tau):
information X = np.c_[np.ones(len(X)), X]
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
:\n",X[1:10]) Y = np.log(np.abs(X ** 2 - 1)
+ .5)
# jitter X
X += np.random.normal(scale=.1, size=n)
num=300)
plot.title.text='tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
return plot
show(gridplot([
[plot_lwr(10.),
plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))
Output:
Experiment-10:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
# Step 1: Preprocessing
# Assuming you have a list of documents and their corresponding labels
documents = [ "This is a positive document" ,
"This document is negative" ,
"Another positive document" ,
"Another negative document" ]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(documents)
# Step 4: Training
# Step 5: Classification
predictions = classifier.predict(X_test)
# Step 6: Evaluation
output: Accuracy:
0.0
Precision: 0.0
ML LAB MANUAL
Recall: 0.0
ML LAB MANUAL
Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.
as sm import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns =
['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width'] y =
pd.DataFrame(iris.target)
y.columns = ['Targets']
model = KMeans(n_clusters=3)
model.fit(X)
plt.figure(figsize=(14,7))
Classifications plt.subplot(1, 2, 1)
plt.xlabel('Petal Length')
ML LAB MANUAL
plt.ylabel('Petal Width')
plt.subplot(1, 2, 2)
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
#xs.sample(5)
GaussianMixture gmm =
GaussianMixture(n_components=3)
gmm.fit(xs)
y_gmm =
gmm.predict(xs)
#y_cluster_gmm
plt.subplot(2, 2, 3)
plt.xlabel('Petal
Length')
ML LAB MANUAL
plt.ylabel('Petal
Width')
ML LAB MANUAL
y_gmm))
output:
In [ ]:
ML LAB MANUAL
Experiment-12:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoysport
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
specific_h = concepts[0].copy()
general_h") print(specific_h)
range(len(specific_h))]
print(general_h)
for i, h in
enumerate(concepts): if
target[i] == "yes":
for x in
range(len(specific_h)): if
h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "no":
ML LAB MANUAL
for x in
range(len(specific_h)): if
h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val ==['?', '?', '?', '?', '?',
g_final, sep="\n")
ML LAB MANUAL
output:
Experiment-13:
Write a Python program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set
Program
import bayespy as bp
import numpy as np
import csv
from colorama import init
from colorama import Fore, Back, Style
init()
# Define Parameter Enum values
#Age
ageEnum = {'SuperSeniorCitizen':0, 'SeniorCitizen':1, 'MiddleAged':2, 'Youth':3, 'Teen':4}
# Gender
genderEnum = {'Male':0, 'Female':1}
# FamilyHistory
familyHistoryEnum = {'Yes':0, 'No':1}
# Diet(Calorie Intake)
dietEnum = {'High':0, 'Medium':1, 'Low':2}
# LifeStyle
lifeStyleEnum = {'Athlete':0, 'Active':1, 'Moderate':2, 'Sedetary':3}
# Cholesterol
cholesterolEnum = {'High':0, 'BorderLine':1, 'Normal':2}
# HeartDisease
heartDiseaseEnum = {'Yes':0, 'No':1}
#heart_disease_data.csv
with open('heart_disease_data.csv') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
data = []
for x in dataset:
data.append([ageEnum[x[0]],genderEnum[x[1]],familyHistoryEnum[x[2]],dietEnum[x[3]],lifeStyleEn
um[x[4]],cholesterolEnum[x[5]],heartDiseaseEnum[x[6]]])
# Training data for machine learning todo: should import from csv
data = np.array(data)
N = len(data)
# Input data column assignment
p_age = bp.nodes.Dirichlet(1.0*np.ones(5))
age = bp.nodes.Categorical(p_age, plates=(N,))
age.observe(data[:,0])
p_gender = bp.nodes.Dirichlet(1.0*np.ones(2))
gender = bp.nodes.Categorical(p_gender, plates=(N,))
gender.observe(data[:,1])
p_familyhistory = bp.nodes.Dirichlet(1.0*np.ones(2))
familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))
familyhistory.observe(data[:,2])
ML LAB MANUAL
p_diet = bp.nodes.Dirichlet(1.0*np.ones(3))
diet = bp.nodes.Categorical(p_diet, plates=(N,))
diet.observe(data[:,3])
p_lifestyle = bp.nodes.Dirichlet(1.0*np.ones(4))
lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))
lifestyle.observe(data[:,4])
p_cholesterol = bp.nodes.Dirichlet(1.0*np.ones(3))
cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))
cholesterol.observe(data[:,5])
# Prepare nodes and establish edges
# np.ones(2) -> HeartDisease has 2 options Yes/No
# plates(5, 2, 2, 3, 4, 3) -> corresponds to options present for domain values
p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4, 3))
heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet, lifestyle, cholesterol],
bp.nodes.Categorical, p_heartdisease)
heartdisease.observe(data[:,6])
C:\Anaconda3\lib\site-packages\bayespy\inference\vmp\nodes\categorical.py:107: FutureWarning:
Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead
of `arr[seq]`. In the future this will be interpreted as an array
p_heartdisease.update()
# Sample Test with hardcoded values
#print("Sample Probability")
#print("Probability(HeartDisease|Age=SuperSeniorCitizen, Gender=Female, FamilyHistory=Yes,
DietIntake=Medium, LifeStyle=Sedetary, Cholesterol=High)")
#print(bp.nodes.MultiMixture([ageEnum['SuperSeniorCitizen'], genderEnum['Female'],
familyHistoryEnum['Yes'], dietEnum['Medium'], lifeStyleEnum['Sedetary'], cholesterolEnum['High']],
bp.nodes.Categorical, p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']])
# Interactive Test
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([int(input('Enter Age: ' + str(ageEnum))), int(input('Enter Gender: ' +
str(genderEnum))), int(input('Enter FamilyHistory: ' + str(familyHistoryEnum))), int(input('Enter
dietEnum: ' + str(dietEnum))), int(input('Enter LifeStyle: ' + str(lifeStyleEnum))), int(input('Enter
Cholesterol: ' + str(cholesterolEnum)))], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
#print(Style.RESET_ALL)
m = int(input("Enter for Continue:0, Exit :1 "))
OUTPUT
Enter Age: {'SuperSeniorCitizen': 0, 'SeniorCitizen': 1, 'MiddleAged': 2, 'Youth': 3, 'Teen': 4}1
Enter Gender: {'Male': 0, 'Female': 1}0
Enter FamilyHistory: {'Yes': 0, 'No': 1}0
Enter dietEnum: {'High': 0, 'Medium': 1, 'Low': 2}2
Enter LifeStyle: {'Athlete': 0, 'Active': 1, 'Moderate': 2, 'Sedetary': 3}2
Enter Cholesterol: {'High': 0, 'BorderLine': 1, 'Normal': 2}1
C:\Anaconda3\lib\site-packages\bayespy\inference\vmp\nodes\categorical.py:43: FutureWarning:
Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead
of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will
result either in an error or a different result.
u0[[np.arange(np.size(x)), np.ravel(x)]] = 1
ML LAB MANUAL
Probability(HeartDisease) = 0.5
Enter for Continue:0, Exit :1 1
Experiment -14:
Write a program to implement Support Vector Machines
Aim:
To implement Support Vector Machines
Dataset: haberman.csv- The dataset contains cases from a study that was conducted between 1958
and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had
undergone surgery for breast cancer. The goal is to predict the Survival status (class attribute) of the
patient(1 = the patient survived 5 years or longer,2 = the patient died within 5 years). The data set is
collected from https://archive.ics.uci.edu/ml/datasets/Haberman's+Survival.
Program code:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
data = pd.read_csv(r"E:\sudhakar\haberman.csv", header=None)
#age=age of the patient
#year=Patient's year of operation (year - 1900)
#pos_axil_nodes=Number of positive axillary nodes detected
#survival_status:1 -the patient survived 5 years or longer
# :2 -the patient died within 5 year
col_names=['age','year','pos_axil_nodes','survival_status']
data.columns=col_names
#we removed the attribute year of operation
data=data.drop(['year'], axis=1)
print('The first 5 rows of the data set are:')
print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
class_lbls=data['survival_status'].unique()
class_labels=[]
for x in class_lbls:
class_labels.append(str(x))
print('Class labels are:')
print(class_labels)
sns.countplot(data['survival_status'])
col_names=data.columns
feature_names=col_names[:-1]
ML LAB MANUAL
feature_names=list(feature_names)
print('Feature names are:')
print(feature_names)
x_set = data.drop(['survival_status'], axis=1)
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data['survival_status']
print('First 5 rows of target variable are:')
print(y_set.head())
print('Distribution of Target variable is:')
print(y_set.value_counts())
scaler=StandardScaler()
x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)
scaler.fit(x_train)
x_train=scaler.transform(x_train)
model =SVC()
print("Traning the model with train data set")model.fit(x_train, y_train)
x_test=scaler.transform(x_test)
y_pred=model.predict(x_test)
print('Predicted class labels for test data are:')
print(y_pred)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("Precision:",precision_score(y_test, y_pred))
print("Recall:",recall_score(y_test, y_pred))
print(classification_report(y_test,y_pred,target_names=class_labels))
cm=confusion_matrix(y_test,y_pred)
df_cm = pd.DataFrame(cm, columns=class_labels, index = class_labels)
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30, cmap=plt.cm.Paired)
plt.xlabel('age')
plt.ylabel('pos_axil_nodes')
plt.title('Data points in traning data set')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30, cmap=plt.cm.Paired)
plt.xlabel('age')
plt.ylabel('pos_axil_nodes')
plt.title('support vectors and decision boundary')
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = model.decision_function(xy).reshape(XX.shape)
ax.contour(XX, YY, Z, colors='red', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
ML LAB MANUAL
Experiment -15:
Write a program to implement principle component analysis
import numpy as nmp
import matplotlib.pyplot as mpltl
import pandas as pnd
DS = pnd.read_csv('Wine.csv')
# Now, we will distribute the dataset into two components "X" and "Y"
X = DS.iloc[: , 0:13].values
Y = DS.iloc[: , 13].values
from sklearn.model_selection import train_test_split as tts
X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler as SS
SC = SS()
X_train = SC.fit_transform(X_train)
X_test = SC.transform(X_test)
from sklearn.decomposition import PCA
PCa = PCA (n_components = 1)
X_train = PCa.fit_transform(X_train)
X_test = PCa.transform(X_test)
explained_variance = PCa.explained_variance_ratio_
from sklearn.linear_model import LogisticRegression as LR
classifier_1 = LR (random_state = 0)
classifier_1.fit(X_train, Y_train)
Output:
LogisticRegression(random_state=0)
ML LAB MANUAL
Add on Program:
Experiment-16:
Implementing Candidate Elimination algorithm using python
Program:
import numpy as np
# Define a function to check if one hypothesis is more general than another
def is_more_general(h1, h2):
more_general_parts = []
for x, y in zip(h1, h2):
mg = x == '?' or (x != '0' and (x == y or y == '0'))
more_general_parts.append(mg)
return all(more_general_parts)
# Initialize the specific boundary (S) with the most specific hypothesis
specific_hypothesis = ['0'] * n_attributes
# Initialize the general boundary (G) with the most general hypothesis
general_hypothesis = [['?' for _ in range(n_attributes)]]
new_general_hypothesis.append(new_hypothesis)
general_hypothesis = new_general_hypothesis
OUTPUT:
Final Specific Hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']
Final General Hypotheses: [['Sunny', 'Warm', '?', 'Strong', '?', '?']]
Experiment-17:
Implement K-Means Clustering using python.
Program:
import numpy as np
import matplotlib.pyplot as plt
class KMeans:
def __init__(self, n_clusters=3, max_iter=100):
self.n_clusters = n_clusters
self.max_iter = max_iter
self.centroids = None
for _ in range(self.max_iter):
# Assign clusters
distances = self._calculate_distances(X)
clusters = np.argmin(distances, axis=1)
self.centroids = new_centroids
return clusters
OUTPUT:
ML LAB MANUAL