0% found this document useful (0 votes)
5 views29 pages

ML Manual

Manual

Uploaded by

PARKAVI.D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views29 pages

ML Manual

Manual

Uploaded by

PARKAVI.D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Ex.No:01 Implement A Linear Regression With A Real Dataset.

Experiment
With Different Features In Building A Model .
Tune The Model’S Hyperparameters

AIM:
To Implement Linear Regression with a real datasets and Experiment with different features in
building a model. Tune the model's hyper parameters.

Procedure:
1. Define business object.
2. Make sense of the data from a high level.
data types (number, text, object, etc.)
continuous/discrete
basic stats (min, max, std, median, etc.) using boxplot
frequency via histogram
scales and distributions of different features
3. Create the training and test sets using proper sampling methods, e.g., random vs. Stratified.
4. Correlation analysis (pair-wise and attribute combinations).
5. Data cleaning (missing data, outliers, data errors).
6. Data transformation via pipelines (categorical text to number using one hot encoding, feature
scaling
via normalization/standardization, feature combinations).
7. Train and cross validate different models and select the most promising one (Linear Regression,
Decision Tree, and Random Forest were tried in this tutorial).
8. Fine tune the model using trying different combinations of hyper parameters.
9. Evaluate the model with best estimators in the test set.
10. Launch, monitor, and refresh the model and system.

Program
# This Python 3 environment comes with many helpful analytic libraries installed.
# It is defined by the kaggle/python docker image: https://github.com/kaggle/ docker-python.
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input
Directory
import os
print(os.listdir("../input"))
# Any results you write to the current directory are saved as output.
['anscombe.csv', 'housing.csv']
# loading data
data_path = "../input/housing.csv"
housing = pd.read_csv(data_path)
# see the basic info
housing.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
longitude 20640 non-null float64
latitude 20640 non-null float64
housing_median_age 20640 non-null float64
total_rooms 20640 non-null float64
total_bedrooms 20433 non-null float64
population 20640 non-null float64
households 20640 non-null float64
median_income 20640 non-null float64
median_house_value 20640 non-null float64
ocean_proximity 20640 non-null object
types: float64(9), object(1)
memory usage: 1.6+ MB
Input(3): housing.head(10)
Input(4): housing.describe()
Input(5): housing.boxplot(['median_house_value'], figsize=(10, 10))
Input(6): housing.hist(bins=50, figsize=(15, 15))
Output(6):

Input(7): housing['ocean_proximity'].value_counts()
Output(7):
<1H OCEAN 9136
INLAND 6551
NEAR OCEAN 2658
NEAR BAY 2290
ISLAND 5
Name: ocean_proximity, dtype: int64
Input(8):
op_count = housing['ocean_proximity'].value_counts()
plt.figure(figsize=(10,5))
sns.barplot(op_count.index, op_count.values, alpha=0.7)
plt.title('Ocean Proximity Summary')
plt.ylabel('Number of Occurrences', fontsize=12)
plt.xlabel('Ocean Proximity', fontsize=12)
plt.show()
# housing['ocean_proximity'].value_counts().hist()
Output(8):

Input(9): housing['median_income'].hist()
Output(9): <matplotlib.axes._subplots.AxesSubplot at 0x7f264523cb00>

Input(10): housing['median_income'].hist()
Output(10): <matplotlib.axes._subplots.AxesSubplot at 0x7f264523cb00>

Input(11):housing.plot(kind='scatter', x='longitude', y='latitude', alpha=0.1)


Output(11): <matplotlib.axes._subplots.AxesSubplot at 0x7f2645224e10>
Input(12):
# Pearson's r, aka, standard correlation coefficient for every pair
corr_matrix = housing.corr()
# Check the how much each attribute correlates with the median house value
corr_matrix['median_house_value'].sort_values(ascending=False)
Output(12):
median_house_value 1.000000
median_income 0.687160
total_rooms 0.135097
housing_median_age 0.114110
households 0.064506
total_bedrooms 0.047689
population -0.026920
longitude -0.047432
latitude -0.142724
Name: median_house_value, dtype: float64.

Result:
Thud the Implementation of Linear Regression with a real datasets and Experiment with different
features in building a model. Tune the model's hyper parameters was executed successfully
Ex.No: 2 Implement A Classification Model.That Is,Answers A Binary Question Such As ”Are
Houses In This Neighborhood Above A Certain Rice?” (Use Data From Exercise 1).
Modify The Classification Threshold And Determine How That Modification
Influences The Model . Experimet With Different Classification
Metrics To Determine Your Model’S Effectiveness.

Aim :
The aim is to build a binary classification model to predict if houses in a neighborhood
exceed a specified price threshold.It explores how adjusting the classification threshold impacts
model performance and evaluates effectiveness using different classification metrics.

Procedures:

1. Import necessary libraries (e.g., pandas, train_test_split, TfidfVectorizer or other feature


extractors, classifiers like LogisticRegression, and metrics such as accuracy_score, precision,
recall, and ROC-AUC).
2. Load the dataset from exercise 1, ensuring it contains housing prices and relevant features.
3. Define the binary target by setting a price threshold (e.g., create a new column that is 1 if the
house price exceeds the threshold, 0 otherwise).
4. Preprocess the dataset (clean missing values, standardize/normalize features, encode
categorical variables if needed).
5. Split the data into training and test sets using train_test_split.
6. Train a binary classification model (e.g., Logistic Regression) on the training data.
7. Obtain probability predictions on the test set using the model’s predict_proba method.
8. Modify the classification threshold by setting different probability cutoffs (e.g., 0.3, 0.5, 0.7)
and generate binary predictions accordingly.
9. Evaluate the model’s performance at each threshold using various metrics (accuracy, precision,
recall, F1-score, ROC-AUC).
10. Analyze the deltas in model performance as the threshold changes to understand trade-offs
between false positives and false negatives.
11. Compare and select the threshold that balances the classification metrics best for your specific
use case.
12. Document findings and discuss how threshold adjustments influence the model's effectiveness.

Program:

train_df = pd.read_csv("https://download.mlcc.google.com/mledu
datasets/california_housing_train.csv")
test_df = pd.read_csv("https://download.mlcc.google.com/mledu-
datasets/california_housing_test.csv")
train_df = train_df.reindex(np.random.permutation(train_df.index))
# shuffle the training set
threshold = 265000 # This is the 75th percentile for median house values.
train_df_norm["median_house_value_is_high"] = ? Your code here
test_df_norm["median_house_value_is_high"] = ? Your code here
# Print out a few example cells from the beginning and
# middle of the training set, just to make sure that
# your code created only 0s and 1s in the newly created
# median_house_value_is_high column
train_df_norm["median_house_value_is_high"].head(8000)
inputs = {
# Features used to train the model on.
'median_income': tf.keras.Input(shape=(1,)),
'total_rooms': tf.keras.Input(shape=(1,))
}
# The following variables are the hyperparameters.
learning_rate = 0.001
epochs = 20
batch_size = 100
classification_threshold = 0.35
label_name = "median_house_value_is_high"
# Modify the following definition of METRICS to generate
# not only accuracy and precision, but also recall:
METRICS = [
tf.keras.metrics.BinaryAccuracy(name='accuracy',
threshold=classification_threshold),
tf.keras.metrics.Precision(thresholds=classification_threshold,
name='precision'
),
? # write code here
]
# Establish the model's topography.
my_model = create_model(inputs, learning_rate, METRICS)
# Train the model on the training set.
epochs, hist = train_model(my_model, train_df_norm, epochs,
label_name, batch_size)
# Plot metrics vs. Epochs
list_of_metrics_to_plot = ['accuracy', 'precision', 'recall']
plot_curve(epochs, hist, list_of_metrics_to_plot)
OUTPUT:

Result:

Thus the implementation of binary classification model was executed successfully.


Ex.No.3 Classification With Nearest Neighbours.In This Question,You Will Use The Scikit-
Learn’Sknn Classifier To Classify Real Vs.Fake News Headlines. The Aim Of This
Question Is For You To Read The Scikit-Learn Api And Get Comfortable With
Training/Validation Splits. Use California Housing Dataset.
AIM:
To Write a program to implement classification with K-Nearest Neighbors . The aim of this
question is for to read the scikit-learn API and get comfortable with training/validation splits. Use
California Housing Datasets.

Procedure:

1.Import necessary libraries like pandas, train_test_split, TfidfVectorizer, KNeighborsClassifier,


accuracy_score, and classification_report.

2.Load the dataset containing news headlines and labels (1 for real, 0 for fake) using
pandas.read_csv().

3.Remove missing values and convert all text to lowercase for consistency.

4.Convert text data into numerical form using TF-IDF Vectorization.

5.Split the dataset into 80% training and 20% validation using train_test_split().

6.Instantiate KNeighborsClassifier(n_neighbors=k) and fit the model using knn.fit(X_train, y_train).

7.Use knn.predict(X_test) to classify headlines.

8.Compute accuracy using accuracy_score(y_test, y_pred) and generate a classification report.

9.Experiment with different values of k (e.g., 3, 5, 7) and compare accuracy.

10.Improve performance by normalizing text features, trying different distance metrics, or using
dimensionality reduction.

Program:

import csv
import random
import math
import operator
def loadDataset(filename, split, trainingSet=[] , testSet=[]) :
with open(filename, 'rb') as csvfile :
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
if random.random() < split: trainingSet.append(dataset[x])
else:
testSet.append(dataset[x])
def euclideanDistance(instance1, instance2, length):
distance = 0
for x in range(length):
distance += pow((instance1[x] - instance2[x]), 2)
return math.sqrt(distance)
def getNeighbors(trainingSet, testInstance, k):
distances = []
length = len(testInstance)-1
for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet[x], length) distances.append((trainingSet[x],
dist))
distances.sort(key=operator.itemgetter(1))
neighbors = []
for x in range(k):
neighbors.append(distances[x][0])
return neighbors
def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = neighbors[x][-1]
if response in classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes = sorted(classVotes.iteritems(),reverse=True)
return sortedVotes[0][0]
def getAccuracy(testSet, predictions): correct = 0
for x in range(len(testSet)):
key=operator.itemgetter(1);
if testSet[x][-1] == predictions[x]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
# prepare Data
trainingSet= [] testSet=[]
split = 0.67
loadDataset('knndat.data', split, trainingSet, testSet)
print('Train set: ' + repr(len(trainingSet)))
print('Test set: ' + repr(len(testSet)))
# generate
predictions
predictions=[]
k=3
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x], k)
result = getResponse(neighbors)
predictions.append(result)
print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][- )
accuracy = getAccuracy(testSet, predictions)
print('Accuracy: ' + repr(accuracy) + 1]) '%') main()

Output :
Confusion matrix is as follows:
[[11 0 0]
[0 9 1]
[0 1 8]]
Accuracy metrics:
0 1.00 1.00 1.00 11
1 0.90 0.90 0.90 10
2 0.89 0.89 0.89 9
Avg/Total: 0.93 0.93 0.93 30.

Result:
Thus the Implementation of classification with K-Nearest Neighbors.The aim of this
question is for to read the scikit-learn API and get comfortable with training/validation splits. Use
California Housing Datasets was executed successfully.
Ex.NO 4: In this exercise ,you will experiment with validation sets and test sets using the
dataset.Split a training set into a smaller training set and a validation set. Analyze
deltas between training set and validation set results.Test the trained model with
a test set to determine whether your trained is overfitting. Detect and fix a
common training problem.

Aim:
To implement the experiment with validation sets and test sets using the datasets.

Proccedure:

1. Import necessary libraries like pandas, train_test_split, TfidfVectorizer, KNeighborsClassifier,


accuracy_score, and classification_report.

2. Load the dataset containing news headlines and labels using pandas.read_csv().

3. Remove missing values and convert all text to lowercase for consistency.

4. Convert text data into numerical form using TF-IDF Vectorization.

5. Split the dataset into training (80%) and test (20%) sets using train_test_split().

6. Further split the training set into a smaller training set (70%) and a validation set (30%)
using train_test_split().

7. Train the KNeighborsClassifier(n_neighbors=k) model using the smaller training set.

8. Evaluate the model on the validation set using accuracy_score() and classification_report().

9. Compare training and validation set performance to analyze deltas. If validation accuracy is
significantly lower than training accuracy, overfitting might be occurring.

10. Test the trained model on the separate test set to check for overfitting.

11. If overfitting is detected, apply solutions like reducing k, feature selection, regularization, or
using a different distance metric.

12. Retrain and reevaluate the model to confirm improvements.

Program:

import matplotlib.pyplot as plt


import numpy as np
from sklearn.model_selection import cross_validate, train_test_split
from sklearn.preprocessing import Polynomial Features, StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
np.random.seed(42)
# Generate data and plot
N = 300
x = np.linspace(0, 7*np.pi, N)
smooth = 1 + 0.5*np.sin(x)
y = smooth + 0.2*np.random.randn(N)
plt.plot(x, y)
plt.plot(x, smooth)
plt.xlabel("x")
plt.ylabel("y")
plt.ylim(0,2)
plt.show()
# Train-test split, intentionally use shuffle=False
X = x.reshape(-1,1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, shuffle=False)
# Create two models: Polynomial and linear regression
degree = 2
polyreg = make_pipeline(Polynomial Features(degree), LinearRegression(fit_intercept=False))
linreg = LinearRegression()
# Cross-validation
scoring = "neg_root_mean_squared_error"
polyscores = cross_validate(polyreg, X_train, y_train, scoring=scoring, return_estimator=True)
linscores = cross_validate(linreg, X_train, y_train, scoring=scoring, return_estimator=True)
# Which one is better? Linear and polynomial
print("Linear regression score:", linscores["test_score"].mean())
print("Polynomial regression score:", polyscores["test_score"].mean())
print("Difference:", linscores["test_score"].mean() - polyscores["test_score"].mean())
print("Coefficients of polynomial regression and linear regression:")
# Let's show the coefficient of the last fitted polynomial regression
# This starts from the constant term and in ascending order of powers
print(polyscores["estimator"][0].steps[1][1].coef_)
# And show the coefficient of the last-fitted linear regression
print(linscores["estimator"][0].intercept_, linscores["estimator"][-1].coef_)
# Plot and compare
plt.plot(x, y)
plt.plot(x, smooth)
plt.plot(x, polyscores["estimator"][0].predict(X))
plt.plot(x, linscores["estimator"][0].predict(X))
plt.ylim(0,2)
plt.xlabel("x")
plt.ylabel("y")
plt.show()
# Retrain the model and evaluate
import sklearn
linreg = sklearn.base.clone(linreg)
linreg.fit(X_train, y_train)
print("Test set RMSE:", mean_squared_error(y_test, linreg.predict(X_test), squared=False))
print("Mean validation RMSE:", -linscores["test_score"].mean()).
OUTPUT:

Result:
Thus the implementation of the experiment with validation sets and test sets using the
datasets was executed successfully.
Ex.No:05 Implement the k-means algorithm

Aim:
To implement the k-means clustering algorithm using the Codon Usage Dataset from the UCI
Machine Learning Repository and analyze how codon usage varies across different organisms.

Procedure:

1.Load the dataset from the UCI Machine Learning Repository using pandas. Read the data file and
assign column names accordingly.

2.Preprocess the data by removing non-numeric columns such as species names and handling
missing values if present.

3.Standardize the features using StandardScaler to ensure all variables contribute equally to the
clustering process.

4.Determine the optimal number of clusters (k) by applying the elbow method. Run K-Means for
different values of k, store the inertia values, and plot them to identify the optimal k.

5.Apply the K-Means algorithm using the optimal k value, initialize cluster centroids randomly,
assign data points to the nearest centroid, compute new centroids, and repeat until convergence.

6.Assign cluster labels to each species and store them in the dataset.

7.Evaluate the clustering performance using the silhouette score, which measures the cohesion
and separation of clusters.

8.Visualize the clustering results by printing the assigned clusters for a few species and
interpreting the patterns observed.

Program:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import silhouette_score

# Load dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/codon/codon.data"
column_names = ["Species"] + [f"Codon_{i}" for i in range(1, 65)] + ["GC", "GC3s"]

df = pd.read_csv(url, delimiter=",", names=column_names, header=None)

# Drop non-numeric columns (Species)

X = df.iloc[:, 1:] # Excluding the first column (Species)

X = X.dropna() # Handle missing values if any

# Standardize the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Choose the number of clusters using the elbow method

inertia = []

K_range = range(2, 10)

for k in K_range:

kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)

kmeans.fit(X_scaled)

inertia.append(kmeans.inertia_)

# Plot the elbow curve

plt.plot(K_range, inertia, marker='o')

plt.xlabel('Number of Clusters')

plt.ylabel('Inertia')

plt.title('Elbow Method for Optimal K')

plt.show()

# Apply K-Means with optimal K (Assume 4 from elbow method)

optimal_k = 4

kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)

df['Cluster'] = kmeans.fit_predict(X_scaled)

# Evaluate the clustering

silhouette_avg = silhouette_score(X_scaled, df['Cluster'])


print(f'Silhouette Score: {silhouette_avg:.4f}')

# Display first few rows with clusters

print(df[['Species', 'Cluster']].head())

Output:

Elbow Curve

A plot that helps determine the optimal k value.

Silhouette Score

A metric to evaluate clustering performance.

Cluster Assignments

Species Cluster

Bacteria_1 0

Virus_2 2

Mammal_3 1

Fungi_4 3

Plant_5 0

Result:

Thus the implementation the k-means clustering algorithm using the Codon Usage Dataset from
the UCI Machine Learning Repository was executed successfully.
Ex.NO :6 Implement the Naïve Bayes Classifier .

Aim

To develop a Naïve Bayes Classifier that accurately classifies individuals based on their gait
parameters using the Gait Classification dataset.​

Procedure:

1. Download the Gait Classification dataset from the UCI Machine Learning Repository.

2.Load the dataset into a Pandas DataFrame.

3.Check for missing values and remove any incomplete data.

4.Separate the dataset into features (X) and labels (y).

5.Split the data into training and testing sets (70% train, 30% test).

6.Select the Gaussian Naïve Bayes classifier for numerical data.

7. Train the classifier using the training data.

8.Use the trained model to predict labels for the test set.

9.Evaluate model performance using accuracy, precision, recall, and F1-score.

10.Analyze the classification report to understand the model’s effectiveness.

Program:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, classification_report

# Load the dataset

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00561/gaitDataset.csv'

data = pd.read_csv(url)

# Display basic information about the dataset

print(data.info())
# Handle missing values by removing rows with missing data

data_cleaned = data.dropna()

# Separate features and target variable

X = data_cleaned.drop('label', axis=1)

y = data_cleaned['label']

# Split the data into training and testing sets (70% train, 30% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Gaussian Naïve Bayes classifier

gnb = GaussianNB()

# Train the classifier

gnb.fit(X_train, y_train)

# Predict the labels for the test set

y_pred = gnb.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')

print('Classification Report:')

print(classification_report(y_test, y_pred))

Output:

<class 'pandas.core.frame.DataFrame'>

Int 64 Index: 48 entries, 0 to 47

Columns: 321 entries, feature1 to label

dtypes: float64(320), object(1)

memory usage: 120.5+ KB

Accuracy: 0.85
Classification Report:

precision recall f1-score support

0 0.88 0.82 0.85 11

1 0.83 0.90 0.86 14

accuracy 0.85 25

macro avg 0.85 0.85 0.85 25

weighted avg 0.85 0.85 0.85 25

Result:

Thus the Implementation of the Naïve Bayes Classifierwas executed successfully.


Ex.No.7 Build an Artificial Neural Network by implementing the Back propagation
Algorithm and test the same using appropriate data sets.

Aim :
To Build an Artificial Neural Network by implementing the Back propagation Algorithm and
test the same using appropriate data sets.

Procedure
1.Load data set
2. Assign all network inputs and output
3.Initialize all weights with small random numbers, typically between -1 and 1 repeat for every
pattern in the training set
Present the pattern to the network
// Propagated the input forward through the network: for each layer in the network
for every node in the layer
1. Calculate the weight sum of the inputs to the node
2. Add the threshold to the sum
3. Calculate the activation for the node
end
end
// Propagate the errors backward through the network
for every node in the output layer
calculate the error signal
end
for all hidden layers
for every node in the layer
1. Calculate the node's signal error
2. Update each node's weight in the network end
end
// Calculate Global Error Calculate the Error Function end
while ((maximum number of iterations < than specified) AND (Error Function is > than specified))

Program:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100
#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
epoch=7000
lr=0.1
inputlayer_neurons = 2
hiddenlayer_neurons =
3 output_neurons = 1
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad wout += hlayer_act.T.dot(d_output) *lr
# dotproduct of nextlayererror and currentlayerop
bout += np.sum(d_output, axis=0,keepdims=True) *lr wh +=
X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

Output:
Input:
[[ 0.66666667 1. ] [ 0.33333333
0.55555556] [ 1. 0.66666667]]
Actual Output: [[ 0.92] [
0.86] [ 0.89]]
Predicted Output: [[
0.89559591] [
0.88142069] [ 0.8928407 ]]
Result:
Thus we Build an Artificial Neural Network by
implementing the Back propagation Algorithm and test the same
using appropriate data sets.
Ex.No:8 Implementation of multilayer perceptron for classification

Aim:
To implement the Multilayer Perceptron for Classification.

Procedure:

1. Importe the Scikit learn library for the implementation.


2. Load the Dataset for our classification model.
3. Splitting dataset into train and test sets
4. Standard Scaler is used to standardize the feature data by scaling each feature to have zero
mean and unit variance.
5. Create the MLP Classifier Model
6. Train the Model.
7. After training the model uses the learned weights and biases to make predictions on the test
data .Then Evaluate the Model.
8. function provides a detailed report on the classification performance including precision, recall,
F1-score and support for each class.

Program:
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
cancer_data = load_breast_cancer()
X, y = cancer_data.data, cancer_data.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
mlp = MLPClassifier(hidden_layer_sizes=(64, 32),
max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy*100:.2f}%")
class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)

Output:

Model training
Accuracy: 97.37%

Classification report :

Result:

Thus the Implementation of the Multilayer Perceptron for Classification was executed successfully.
Ex.No:9 Demonstrate A Simple Application For Image Classification Using CNN

Aim:

To implement a Convolutional Neural Network (CNN) for image classification.

Procedure:

1.Import TensorFlow, Keras, NumPy, and Matplotlib

2.Load the CIFAR-10 dataset, which contains 60,000 images across 10 categories.

3.Normalize pixel values between 0 and 1 and convert labels to numerical format.

4.Define a CNN model with convolutional layers, max pooling, and fully connected layers.

5.Compile the model using the Adam optimizer, cross-entropy loss function, and accuracy as a
metric.

6.Train the model using the training dataset and validate it using the test dataset.

7.Evaluate the model’s performance by measuring accuracy and loss on test data.

8.Make predictions on test images and display the results with corresponding labels.

Program:

# Import TensorFlow

import tensorflow as tf

from tensorflow.keras import datasets, layers, models

import matplotlib.pyplot as plt

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()


# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',

'dog', 'frog', 'horse', 'ship', 'truck']

plt.figure(figsize=(8,8))for i in range(25):

plt.subplot(5,5,i+1)

plt.xticks([])

plt.yticks([])
plt.grid(False)

plt.imshow(train_images[i])

# The CIFAR labels happen to be arrays,

#which is why we need the extra index

plt.xlabel(class_names[train_labels[i][0]])plt.show()

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))

model.add(layers.MaxPooling2D((2,2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.summary()

model.add(layers.Flatten())

model.add(layers.Dense(64, activation='relu'))

model.add(layers.Dense(10))

model.summary()

# An epoch means training the neural network with all the

# training data for one cycle. Here I use 10 epochs

history = model.fit(train_images, train_labels, epochs=10,

validation_data=(test_images, test_labels))

plt.plot(history.history['accuracy'],label='accuracy')

plt.plot(history.history['val_accuracy'],label = 'val_accuracy')

plt.xlabel('Epoch')plt.ylabel('Accuracy')

plt.ylim([0.5, 1])plt.legend(loc='lower right')

test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

print('Test Accuracy is',test_acc)


OUTPUT:

313/313 - 2s - loss: 0.8843 - accuracy: 0.7049


Test Accuracy is 0.7049000263214111

Result:

Thus the simple application for image classification using cnn was executed successfully.
EX.NO:10 Design a bidirectional RNN with multiple hidden layers

Aim:
To design a bidirectional RNN with multiple hidden layers .

Procedure:
1. Importing and loading the dataset required libraries.
2. Padding is a common technique used in natural language processing (NLP) to ensure all input
sequences have the same length.
3. Build the Model.
4. Train the defined model with the data imported
5. Evaluate the accuracy.

Program:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
# Load the IMDB Reviews dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words=10000)
# Pad the sequences to have equal length
max_len = 500
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_len)
x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_len)
# Set the input and output dimensions
input_dim = 10000
output_dim = 1
# Create the input layer
inputs = tf.keras.Input(shape=(None,), dtype="int32")

# Create the model


x = tf.keras.layers.Embedding(input_dim, 128)(inputs)
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True))(x)
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))(x)
outputs = tf.keras.layers.Dense(output_dim, activation="sigmoid")(x)
model = tf.keras.Model(inputs, outputs)

# Compile the model


model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
# Train the model
batch_size = 32
epochs = 5
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test,
y_test))
# Plot the accuracy
fig = plt.plot(history.history['accuracy'])
title = plt.title("History")
xlabel = plt.xlabel("Epochs")
ylabel = plt.ylabel("Accuracy")
Output:
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17464789/17464789 [==============================] - 0s 0us/step

Epoch 1/5
782/782 [==============================] - 84s 94ms/step - loss: 0.4303 - accuracy:
0.8042 - val_loss: 0.3972 - val_accuracy: 0.8420
Epoch 2/5
782/782 [==============================] - 71s 91ms/step - loss: 0.2350 - accuracy:
0.9084 - val_loss: 0.3151 - val_accuracy: 0.8731
Epoch 3/5
782/782 [==============================] - 71s 91ms/step - loss: 0.1568 - accuracy:
0.9430 - val_loss: 0.3594 - val_accuracy: 0.8613
Epoch 4/5
782/782 [==============================] - 72s 92ms/step - loss: 0.1262 - accuracy:
0.9562 - val_loss: 0.4886 - val_accuracy: 0.8560
Epoch 5/5
782/782 [==============================] - 71s 91ms/step - loss: 0.0901 - accuracy:
0.9697 - val_loss: 0.4440 - val_accuracy: 0.8563

Result:
Thus the Implementation of bidirectional of RNN with Multiple Hidden Layers was
executed successfully.

You might also like