AI&ML LAB
6. Assuming a set of documents that need to be classified, use the naïve Bayesian
classifier module to perform this task. Calculate the accuracy, precision and recall for
your dataset.
# DATA SET:
Save dataset in .csv format
Text Documents Label
1 I love this sandwich pos
2 This is an amazing place pos
3 I feel very good about these deers pos
4 This is my best work pos
5 What an awesome view pos
6 I do not like this restaurant neg
7 I am tired of this stuff neg
8 I can’t deal with this neg
9 He is my sworn enemy neg
10 My boss is horrible neg
11 This is an awesome place pos
12 I do not like the taste of this juice neg
13 I love to dance pos
14 I am sick and tired of this place neg
15 What a great holiday pos
16 That is a bad locality to stay neg
17 We will have good fun tomorrow pos
18 I went to my enemy’s house today neg
Dept of CSE,SUK
AI&ML LAB
# PROGRAM:
import pandas as pd
msg=pd.read_csv('naivetext.csv',names=['message','label'])
print('Total instances in the dataset:',msg.shape[0])
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
Y=msg.labelnum
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,Y)
print('\nDataset is split into Actual Training and Testing samples')
print('Total training instances :', xtrain.shape[0])
print('Total testing instances :', xtest.shape[0])
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain) #Sparse matrix
xtest_dtm = count_vect.transform(xtest)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
print('\nPredicted instances are:')
for doc, p in zip(xtest, predicted):
pred = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, pred))
from sklearn import metrics
print('\n-------Accuracy metrics---------')
print('Confusion matrix \n',metrics.confusion_matrix(ytest,predicted))
print('\nAccuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Precison :',metrics.precision_score(ytest,predicted))
print('Recall :',metrics.recall_score(ytest,predicted))
----------------------------------------------------------------------------------------------------------------
OUTPUT:
Total instances in the dataset: 18
Dataset is split into Actual Training and Testing samples
Total training instances : 13
Total testing instances : 5
Dept of CSE,SUK
AI&ML LAB
Predicted instances are:
This is an awesome place -> pos
What a great holiday -> pos
I do not like this restaurant -> neg
He is my sworn enemy -> pos
I went to my enemy’s house today -> pos
-------Accuracy metrics---------
Confusion matrix
[[1 2]
[0 2]]
Accuracy of the classifer is 0.6
Precison : 0.5
Recall : 1.0
-----------------------------------++-------------------------------------------------------------------------.
Dept of CSE,SUK