0% found this document useful (0 votes)

27 views11 pages

Practical 3

The document outlines a practical implementation of the Naïve Bayesian classifier using a sample dataset in CSV format, detailing its theoretical foundation based on Bayes' Theorem. It describes the steps to calculate prior probabilities, likelihoods, and make class predictions, while also discussing different types of Naïve Bayes classifiers and their applications. Additionally, it includes a Python code example for loading data, training the classifier, making predictions, and evaluating its performance.

Uploaded by

2203031050417

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

Practical 3

Uploaded by

2203031050417

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Practical_3

AIM:Write a program to implement the naïve Bayesian classifier for a sample

training data set stored as a .CSV file. Compute the accuracy of the classifier,
considering a few test data sets.

What is Naïve Bayes?

Naïve Bayes is a probabilistic machine learning algorithm based on Bayes' Theorem. It is called
"naïve" because it assumes that the features (or predictors) are independent of each other, which
is often not true in real-world data. Despite this simplification, Naïve Bayes performs well for
certain tasks like text classification, spam filtering, sentiment analysis, and recommendation
systems.

Bayes' Theorem

The foundation of Naïve Bayes is Bayes' Theorem, which calculates the probability of a class C
given a set of features X:
How Naïve Bayes Works

1. Calculate Prior Probabilities P(C)

The prior probability is simply the fraction of instances of class C in the dataset.

2. Calculate Likelihood P(X∣C)

The likelihood of a feature X given the class C is calculated differently for continuous
and categorical features:
○ For Categorical Features: Count the number of times feature X appears in class C
and divide it by the total number of instances in C.
○ For Continuous Features: Assume that the feature follows a Gaussian (normal)
distribution and calculate the likelihood using the following formula:

○ Apply Bayes' Theorem

Use Bayes' theorem to compute the posterior probability for each class C.

Since P(X)P(X)P(X) is common for all classes, it can be ignored in classification.

3. Class Prediction
Predict the class for a given feature vector X by selecting the class with the highest
posterior probability:
Types of Naïve Bayes Classifiers

1. Gaussian Naïve Bayes

○ Used when the features are continuous (like height, weight, or age).
○ Assumes the features follow a Gaussian (normal) distribution.
2. Multinomial Naïve Bayes
○ Used for discrete counts (like text data, bag-of-words representations).
○ Commonly used for text classification and spam detection.
3. Bernoulli Naïve Bayes
○ Used when features are binary (0 or 1, like True/False).
○ Also useful in text classification, where the presence or absence of a word in a
document is considered.

Example Walkthrough

Problem: Classify whether an email is "Spam" or "Not Spam" based on features like "Has
Offer?", "Has Clickable Link?", "Has Known Sender?", etc.

Has Offer Clickable Link Known Sender Spam?

Yes Yes No Yes

No Yes Yes No

Yes No No Yes

No No Yes No
Step 1: Calculate Prior Probabilities P(C)P(C)P(C)

Step 2: Calculate Likelihood P(X∣C)

We calculate probabilities for each feature under each class.

Step 3: Apply Bayes' Theorem

Suppose we have a new email:
Has Offer = Yes, Clickable Link = No, Known Sender = No

1. Compute P(Spam∣X):
2. Compute P(Not Spam∣X):

Result:
Since P(Spam∣X)>P(Not Spam∣X), we classify this email as Spam.

Advantages of Naïve Bayes

● Fast and simple: Works well for large datasets.

● Handles noisy data: Works even if the independence assumption is violated.
● Effective for text data: Used for spam detection, sentiment analysis, and text
classification.

Limitations of Naïve Bayes

● Feature independence assumption: Assumes that features are independent, which is rarely
true.
● Zero probability problem: If a feature value doesn’t appear in the training set, it has zero
likelihood, making P(X∣C)= 0. To handle this, Laplace smoothing is used.

Code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Step 1: Load the dataset
file_path = 'data.csv' # Replace with the actual path of your CSV file

data = pd.read_csv(file_path)
print("First 5 rows of the dataset:\n", data.head())

# Step 2: Preprocess the data (if needed)

# Handle missing values, convert categorical data to numerical, etc.
# Here, we assume the last column is the target (class) and the others are features
X = data.iloc[:, :-1] # Features (all columns except the last)
y = data.iloc[:, -1] # Target (the last column)

# If any categorical data exists, convert it using encoding

# Example: X = pd.get_dummies(X) # Uncomment this line if categorical variables exist

# Step 3: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Train the Naive Bayes classifier

classifier = GaussianNB() # Assuming continuous features
classifier.fit(X_train, y_train)

# Step 5: Make predictions on the test set

y_pred = classifier.predict(X_test)

# Step 6: Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)
print("\nAccuracy of the Naive Bayes classifier: {:.2f}%".format(accuracy * 100))
print("\nConfusion Matrix:\n", confusion)
print("\nClassification Report:\n", report)

# Optional: Test on a new data point

new_data = np.array([[value1, value2, value3, ...]]) # Replace with actual feature values
new_prediction = classifier.predict(new_data)
print("\nPrediction for new data point:", new_prediction)

1️⃣ Importing Libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score,
classification_report, confusion_matrix

● pandas: Used to load and manipulate the dataset.

● numpy: Used for numerical computations, especially for dealing with arrays.
● train_test_split: Splits the dataset into training and testing subsets.
● GaussianNB: Implements the Naïve Bayes classifier for continuous features using a
Gaussian distribution.
● accuracy_score, classification_report, confusion_matrix: Used to evaluate the
classifier's performance.

2️⃣ Loading the Dataset

file_path = 'data.csv' # Replace with the actual path of your
CSV file
data = pd.read_csv(file_path)
print("First 5 rows of the dataset:\n", data.head())

● The dataset is loaded from a CSV file using pd.read_csv(file_path).

● The first 5 rows are printed to give you a preview of the dataset's structure.

3️⃣ Data Preprocessing

X = data.iloc[:, :-1] # Features (all columns except the last)

y = data.iloc[:, -1] # Target (the last column)

● X: All columns except the last one are used as features.

● y: The last column is considered the target variable (class label) we want to predict.

If the dataset has categorical columns (like 'Male', 'Female', 'Yes', 'No'), you may need to convert
them to numerical values. You can do this using one-hot encoding or label encoding as follows:

X = pd.get_dummies(X) # Converts categorical features into

dummy/indicator variables

4️⃣ Splitting the Data

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

● The dataset is split into training (80%) and testing (20%) subsets.
● The random_state=42 ensures reproducibility, meaning every time you run the code,
you get the same split.

5️⃣ Training the Naïve Bayes Classifier

classifier = GaussianNB()
classifier.fit(X_train, y_train)

● We create a Gaussian Naïve Bayes classifier using GaussianNB().

● The fit() method trains the model using the training data (X_train, y_train).

6️⃣ Making Predictions

y_pred = classifier.predict(X_test)

● The classifier predicts the labels for the test set (X_test).
● The predicted labels are stored in y_pred.

7️⃣ Evaluating the Model

accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("\nAccuracy of the Naive Bayes classifier:

{:.2f}%".format(accuracy * 100))
print("\nConfusion Matrix:\n", confusion)
print("\nClassification Report:\n", report)

● Accuracy: Measures the percentage of correct predictions.

● Confusion Matrix: Shows how many true positives, true negatives, false positives, and
false negatives the classifier has.
● Classification Report: Provides precision, recall, and F1-score for each class.

Example of output:

Accuracy of the Naive Bayes classifier: 85.00%

Confusion Matrix:
[[50 5]
[ 7 38]]

Classification Report:
precision recall f1-score support
0 0.88 0.91 0.89 55
1 0.88 0.84 0.86 45
accuracy 0.88 100
macro avg 0.88 0.88 0.88 100
weighted avg 0.88 0.88 0.88 100

Explanation of Metrics:

● Precision: Out of all predicted positive results, how many were actually positive.
● Recall: Out of all actual positive results, how many were predicted correctly.
● F1-Score: Harmonic mean of precision and recall.
● Support: Number of actual occurrences for each class.

8️⃣ Predicting for New Data

new_data = np.array([[value1, value2, value3, ...]]) # Replace

with actual feature values
new_prediction = classifier.predict(new_data)
print("\nPrediction for new data point:", new_prediction)

● Here, you can enter a new data point in the form of an array (example:
np.array([[2.1, 3.4, 1.2, 0.5]])).
● The classifier will predict the class label for this new data point.

Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Exp 3 Bi
No ratings yet
Exp 3 Bi
12 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Purva Rawale - BDA Practical No 2
No ratings yet
Purva Rawale - BDA Practical No 2
9 pages
Practical-3 Ritesh
No ratings yet
Practical-3 Ritesh
5 pages
NaiveBayersClassification BA
No ratings yet
NaiveBayersClassification BA
36 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Naive Bayes Classification Guide
No ratings yet
Naive Bayes Classification Guide
2 pages
Exp 3 Bi 30
No ratings yet
Exp 3 Bi 30
7 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
40 pages
Naive Bayes - Report (Repaired)
No ratings yet
Naive Bayes - Report (Repaired)
5 pages
ML Assignment-7
No ratings yet
ML Assignment-7
3 pages
16 - Naïve Bayes Classifier
No ratings yet
16 - Naïve Bayes Classifier
21 pages
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
18 pages
NB Slides
No ratings yet
NB Slides
29 pages
Naive Bayes for Python Newbies
No ratings yet
Naive Bayes for Python Newbies
3 pages
Machine Learning-Lecture 04
No ratings yet
Machine Learning-Lecture 04
31 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
07 Naive Bayes
No ratings yet
07 Naive Bayes
6 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Naive Bayes Classifier Project
No ratings yet
Naive Bayes Classifier Project
5 pages
Naive Bayes Classifier Program
No ratings yet
Naive Bayes Classifier Program
11 pages
Bayesian
No ratings yet
Bayesian
23 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Naïve Bayes Algorithm Lab
No ratings yet
Naïve Bayes Algorithm Lab
4 pages
NOTES
No ratings yet
NOTES
15 pages
AIML - Ex.3 Manual
No ratings yet
AIML - Ex.3 Manual
4 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
10 pages
Prog 6
No ratings yet
Prog 6
3 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Lec 09
No ratings yet
Lec 09
50 pages
06 - NaiveBayes and ME
No ratings yet
06 - NaiveBayes and ME
25 pages
SP14 CS188 Lecture 21 - Naive Bayes - Print
No ratings yet
SP14 CS188 Lecture 21 - Naive Bayes - Print
41 pages
Lec 09
No ratings yet
Lec 09
50 pages
MLT Practical 3 and 4
No ratings yet
MLT Practical 3 and 4
2 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Pract 8 - Naive Bays Algorithm
No ratings yet
Pract 8 - Naive Bays Algorithm
2 pages
CCS - Lec 5
No ratings yet
CCS - Lec 5
33 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Machine Learning: Classification & Naive Bayes
No ratings yet
Machine Learning: Classification & Naive Bayes
20 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
16 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
MLT Lab 06
No ratings yet
MLT Lab 06
3 pages
Predictive Modelling Project - n1
100% (4)
Predictive Modelling Project - n1
36 pages
Parameter vs Statistic Guide
67% (3)
Parameter vs Statistic Guide
18 pages
Dummy Dependent Variables Models
No ratings yet
Dummy Dependent Variables Models
15 pages
Assignment
75% (4)
Assignment
13 pages
AICcmodavg: Model Selection Guide
No ratings yet
AICcmodavg: Model Selection Guide
22 pages
ESTIMATION
0% (1)
ESTIMATION
51 pages
Introduction To Machine Learning - Unit 5 - Week 2
No ratings yet
Introduction To Machine Learning - Unit 5 - Week 2
4 pages
Event Studies for Finance Experts
No ratings yet
Event Studies for Finance Experts
43 pages
1210 Statistical Tools For Procedure Validation
No ratings yet
1210 Statistical Tools For Procedure Validation
17 pages
Multistage (Cluster) Sampling
No ratings yet
Multistage (Cluster) Sampling
5 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
Excel Analysis ToolPak Guide
No ratings yet
Excel Analysis ToolPak Guide
16 pages
Stats Cheat Sheet
50% (4)
Stats Cheat Sheet
3 pages
6.life Tables
100% (1)
6.life Tables
18 pages
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
Syllabus ECON GR5412 Spring 2025
No ratings yet
Syllabus ECON GR5412 Spring 2025
3 pages
ANOVA Example Crash Test Dummies
No ratings yet
ANOVA Example Crash Test Dummies
6 pages
Review Questions: P (X) Represent?
No ratings yet
Review Questions: P (X) Represent?
1 page
Probability Cheatsheet
100% (1)
Probability Cheatsheet
10 pages
Stats Proba Q4 Lesson 1 Null and Alternative Hypotheses With Seatwork 1
No ratings yet
Stats Proba Q4 Lesson 1 Null and Alternative Hypotheses With Seatwork 1
20 pages
Periodograms and Blackman-Tukey Spectral Estimation: - Objectives
No ratings yet
Periodograms and Blackman-Tukey Spectral Estimation: - Objectives
11 pages
Designer Wear Operates A Chain of 10 Retail Department Stores
No ratings yet
Designer Wear Operates A Chain of 10 Retail Department Stores
2 pages
WAFO 25 Tutorial
No ratings yet
WAFO 25 Tutorial
185 pages
Mid Komputasi Statistika (Firmina Fenanlampir 18504164
No ratings yet
Mid Komputasi Statistika (Firmina Fenanlampir 18504164
10 pages
Parametric and Non-Parametric Tests
No ratings yet
Parametric and Non-Parametric Tests
10 pages
Incertidumbre Micro
No ratings yet
Incertidumbre Micro
152 pages
Hamilton Capitulo 17
No ratings yet
Hamilton Capitulo 17
36 pages
Econometrics Midterm Solutions
No ratings yet
Econometrics Midterm Solutions
3 pages
Hassan Soleimani - Non-Parametric Statistics For Applied Linguistics Research-Rahnama Press (2009)
No ratings yet
Hassan Soleimani - Non-Parametric Statistics For Applied Linguistics Research-Rahnama Press (2009)
164 pages
Probability & Statistics Course Guide
No ratings yet
Probability & Statistics Course Guide
30 pages

Practical 3

Uploaded by

Practical 3

Uploaded by

Practical_3

AIM:Write a program to implement the naïve Bayesian classifier for a sample

What is Naïve Bayes?

1. Calculate Prior Probabilities P(C)

2. Calculate Likelihood P(X∣C)

○ Apply Bayes' Theorem

Since P(X)P(X)P(X) is common for all classes, it can be ignored in classification.

1. Gaussian Naïve Bayes

Has Offer Clickable Link Known Sender Spam?

Yes Yes No Yes

Step 2: Calculate Likelihood P(X∣C)

Step 3: Apply Bayes' Theorem

Advantages of Naïve Bayes

● Fast and simple: Works well for large datasets.

Limitations of Naïve Bayes

# Step 2: Preprocess the data (if needed)

# If any categorical data exists, convert it using encoding

# Step 3: Split the data into training and testing sets

# Step 4: Train the Naive Bayes classifier

# Step 5: Make predictions on the test set

# Step 6: Evaluate the model

# Optional: Test on a new data point

1️⃣ Importing Libraries

● pandas: Used to load and manipulate the dataset.

2️⃣ Loading the Dataset

● The dataset is loaded from a CSV file using pd.read_csv(file_path).

3️⃣ Data Preprocessing

X = data.iloc[:, :-1] # Features (all columns except the last)

● X: All columns except the last one are used as features.

X = pd.get_dummies(X) # Converts categorical features into

4️⃣ Splitting the Data

5️⃣ Training the Naïve Bayes Classifier

● We create a Gaussian Naïve Bayes classifier using GaussianNB().

6️⃣ Making Predictions

7️⃣ Evaluating the Model

print("\nAccuracy of the Naive Bayes classifier:

● Accuracy: Measures the percentage of correct predictions.

Accuracy of the Naive Bayes classifier: 85.00%

8️⃣ Predicting for New Data

new_data = np.array([[value1, value2, value3, ...]]) # Replace

You might also like