0% found this document useful (0 votes)
11 views46 pages

Manual - V1

Uploaded by

begumelisa14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views46 pages

Manual - V1

Uploaded by

begumelisa14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Exp. No.

: 1 Implementation of Uninformed search algorithms (BFS, DFS)

AIM
To implement the uninformed search algorithms (BFS, DFS) using a Python code.

Breadth-First Search (BFS) Algorithm:


BFS explores a graph level by level, visiting all neighbors of a node before moving on to the
next level. Step 1: Create a queue and enqueue the starting node.
Step 2: Mark the starting node as visited. Step 3: While the queue is not empty:
a. Dequeue a node from the front of the queue.
b. Process the dequeued node.
c. Enqueue all unvisited neighbors of the dequeued node and mark them as visited.

Depth-First Search (DFS) Algorithm:


DFS explores a graph by going as deep as possible along each branch before backtracking. Step
1: Create a stack and push the starting node onto the stack.
Step 2: Mark the starting node as visited.
Step 3: While the stack is not empty:
a. Pop a node from the top of the stack.
b. Process the popped node.
c. Push all unvisited neighbors of the popped node onto the stack and mark them as visited
Breadth-First Search (BFS) Algorithm: Depth-First Search (DFS) Algorithm:

BFS Code:
from collections import deque

def bfs(graph, start, goal):


visited = set()
queue = deque([[start]])

if start == goal:
return [start]

while queue:
path = queue.popleft() # Get the first path from the queue
node = path[-1] # Get the last node from the path

if node not in visited:


neighbors = graph.get(node, [])
for neighbor in neighbors:
new_path = list(path)
new_path.append(neighbor)
queue.append(new_path)

if neighbor == goal:
return new_path
visited.add(node)

return None # If no path is found

Input :
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F'],
'D': [],
'E': ['F'],
'F': []
}

result = bfs(graph, 'A', 'F')


print("Path from A to F:", result)

Output :

DFS Code:
def dfs(graph, start, goal):
visited = set()
stack = [[start]]

if start == goal:
return [start]

while stack:
path = stack.pop() # Get the last path from the stack
node = path[-1] # Get the last node from the path

if node not in visited:


neighbors = graph.get(node, [])
for neighbor in reversed(neighbors): # Reverse to maintain left-to-right order
new_path = list(path)
new_path.append(neighbor)
stack.append(new_path)
if neighbor == goal:
return new_path

visited.add(node)

return None # If no path is found

Input:

graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F'],
'D': [],
'E': ['F'],
'F': []
}

result = dfs(graph, 'A', 'F')


print("Path from A to F:", result)

Output:

Result:
The Breadth first search algorithm and depth first search algorithm was implements on output
was verified.
Exp. No.: 2 Implementation of Informed search algorithms (A*, memory-
bounded A*)

Aim :
To implement the Informed search algorithms (A*, memory-bounded A*)

Algorithm:
1: Firstly, Place the starting node into OPEN and find its f (n) value.
2: Then remove the node from OPEN, having the smallest f (n) value. If it is a goal node, then
stop and return to success.
3: Else remove the node from OPEN, and find all its successors.
4: Find the f (n) value of all the successors, place them into OPEN, and place the removed node
into CLOSE.
5: Goto Step-2.
6: Exit.

Flow chart:
A* Code:
import heapq

def astar(graph, start, goal, heuristic):


open_list = []
heapq.heappush(open_list, (0 + heuristic[start], [start])) # (f, path)
visited = set()

while open_list:
cost, path = heapq.heappop(open_list)
node = path[-1]

if node in visited:
continue

visited.add(node)

if node == goal:
return path

for neighbor, weight in graph.get(node, {}).items():


if neighbor not in visited:
g = cost - heuristic[node] + weight # actual cost so far
f = g + heuristic[neighbor]
new_path = list(path)
new_path.append(neighbor)
heapq.heappush(open_list, (f, new_path))

return None

graph = {
'A': {'B': 1, 'C': 4},
'B': {'D': 5, 'E': 1},
'C': {'F': 3},
'D': {},
'E': {'F': 1},
'F': {}
}

# Heuristic values: estimated cost from each node to the goal 'F'
heuristic = {
'A': 6,
'B': 4,
'C': 2,
'D': 6,
'E': 2,
'F': 0
}

Input:

result = astar(graph, 'A', 'F', heuristic)


print("Path from A to F:", result)

Output:

Memory-bounded A* Code:

import heapq

class Node:
def __init__(self, state, path, g, h):
self.state = state
self.path = path
self.g = g # Cost to reach this node
self.h = h # Heuristic estimate to goal

def f(self):
return self.g + self.h

def __lt__(self, other):


return self.f() < other.f()

def a_star(graph, start, goal, heuristic):


open_list = []
heapq.heappush(open_list, Node(start, [start], 0, heuristic[start]))
closed_set = set()

while open_list:
current = heapq.heappop(open_list)

if current.state == goal:
return current.path

if current.state in closed_set:
continue

closed_set.add(current.state)
for neighbor, cost in graph.get(current.state, []):
if neighbor not in closed_set:
g_new = current.g + cost
h_new = heuristic[neighbor]
new_node = Node(neighbor, current.path + [neighbor], g_new, h_new)
heapq.heappush(open_list, new_node)

# To apply memory bounds (SMA*), limit size of open_list and prune least promising
nodes

return None

graph = {
'A': [('B', 1), ('C', 4)],
'B': [('D', 5), ('E', 2)],
'C': [('F', 3)],
'D': [],
'E': [('F', 1)],
'F': []
}

heuristic = {
'A': 7,
'B': 6,
'C': 5,
'D': 3,
'E': 2,
'F': 0
}

Input:
result = a_star(graph, 'A', 'F', heuristic)
print("Path from A to F:", result)

Output:

Result:
A* and memory-bounded A* algorithms were implemented using Python code.
Exp. No.: 3 Implement naïve Bayes models
Aim:
To implement the naïve Bayes models using python language

Algorithm:

Step 1 – Importing the Libraries Step 2 – Loading the Dataset


Step 3 – Splitting the dataset into the Training set and Test set Step 4 – Feature Scaling
Step 5 – Training the Naive Bayes Classification model on the Training Set Step 6 – Predicting
the Test set results
Step 7 – Confusion Matrix and Accuracy

Flowchart:
Code:

from sklearn.model_selection import train_test_split


from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd
import numpy as np

# Load iris dataset


iris = pd.read_csv("iris.csv")
# Features and target
X = iris.drop('species', axis=1)
y = iris['species']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Naive Bayes classifier


model = GaussianNB()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Output:

Result:
The naïve Bayes models have been implemented with Python code.
Exp. No.: 4 Implement Bayesian Networks

Aim:
To implement Bayesian Networks using Python code.

Algorithm:
Step 1 – Import the required libraries
Step 2 – Define the structure of the Bayesian Network
Step 3 – Define Conditional Probability Tables (CPTs)
Step 4 – Create the Bayesian Model
Step 5 – Add CPDs to the model
Step 6 – Validate the model
Step 7 – Perform Inference on the model

Bayesian Networks Code:


pip install pgmpy
# Step 1 – Import the required libraries
from pgmpy.models import DiscreteBayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# Step 2 – Define the structure of the Bayesian Network


model = DiscreteBayesianNetwork([
('Rain', 'Sprinkler'),
('Rain', 'WetGrass'),
('Sprinkler', 'WetGrass')
])

# Step 3 – Define the Conditional Probability Tables (CPTs)


cpd_rain = TabularCPD(variable='Rain', variable_card=2, values=[[0.7], [0.3]])

cpd_sprinkler = TabularCPD(
variable='Sprinkler', variable_card=2,
values=[[0.8, 0.2],
[0.2, 0.8]],
evidence=['Rain'],
evidence_card=[2]
)

cpd_wetgrass = TabularCPD(
variable='WetGrass', variable_card=2,
values=[[1.0, 0.1, 0.1, 0.01],
[0.0, 0.9, 0.9, 0.99]],
evidence=['Rain', 'Sprinkler'],
evidence_card=[2, 2]
)

# Step 4 – Add CPDs to the model


model.add_cpds(cpd_rain, cpd_sprinkler, cpd_wetgrass)

# Step 5 – Check if the model is valid


assert model.check_model()

# Step 6 – Perform inference


infer = VariableElimination(model)

# Step 7 – Query the model


result = infer.query(variables=['Rain'], evidence={'WetGrass': 1})
print(result)

Output:

Result:
Thus, the Bayesian Networks were implemented using Python Code.
Exp. No.: 4 Implement Bayesian Networks

Aim:
To implement Bayesian Networks using Python code.

Algorithm:
Step 1 – Import Libraries
Step 2 – Load and Preprocess Dataset
Step 3 – Define the Network Structure
Step 4 – Learn the CPDs from Data
Step 5 – Build and Validate the Model
Step 6 – Perform Inference

Bayesian Networks Code:


pip install pgmpy

# Step 1: Import libraries


import pandas as pd
from pgmpy.models import DiscreteBayesianNetwork
from pgmpy.estimators import BayesianEstimator
from pgmpy.inference import VariableElimination
from sklearn.preprocessing import KBinsDiscretizer

# Step 2: Load dataset


df = pd.read_csv('heart_disease_uci.csv') # Adjust path as needed

# Step 3: Select and preprocess features


features = ['age', 'sex', 'cp', 'exang', 'ca', 'thal', 'num']
df = df[features].copy()

# Convert numerical target into binary: 0 = no disease, 1 = disease


df['num'] = df['num'].apply(lambda x: 1 if x > 0 else 0)

# Discretize 'age' into 3 bins


age_discretizer = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='quantile')
df['age'] = age_discretizer.fit_transform(df[['age']]).astype(int)
# Convert all columns to string type (pgmpy expects categorical values as strings)
df = df.astype(str)

# Step 4: Print unique values for reference


print("Unique values per column for evidence reference:")
for col in df.columns:
print(f"{col}: {df[col].unique()}")

# Step 5: Define Bayesian Network structure


model = DiscreteBayesianNetwork([
('age', 'cp'),
('sex', 'cp'),
('cp', 'num'),
('exang', 'num'),
('ca', 'num'),
('thal', 'num')
])

# Step 6: Learn CPDs using Bayesian Estimator


model.fit(df, estimator=BayesianEstimator, prior_type='BDeu')

# Step 7: Validate model


assert model.check_model()

# Step 8: Perform inference


inference = VariableElimination(model)

# Use valid evidence values from printed categories


evidence = {
'cp': 'asymptomatic', # valid cp value
'exang': 'True' # valid exang value
}

# Step 9: Query probability of heart disease


result = inference.query(variables=['num'], evidence=evidence)
print("\nPredicted probability of heart disease given evidence:")
print(result)
Output:

Result:
Thus, the Bayesian Networks were implemented using Python code with the help of the
heart disease data set.
Exp. No.: 5 Build Regression models

Aim:
a. To implement the linear regression model using Python code
b. To implement the logistic regression model using Python code

Algorithm:
Step 1 – Import Libraries
Step 2 – Load and Preprocess Dataset
Step 3 – Define the Network Structure
Step 4 – Learn the Coefficients from Data
Step 5 – Build and Validate the Model
Step 6 – Perform Inference

Linear Regression Code:


import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load the dataset


data = pd.read_csv("california_housing_test.csv") # Make sure the path to your CSV file is correct

# Inspect the columns to identify the target and feature


print(data.columns)

# Now using 'median_income' as the feature and 'median_house_value' as the target


X = data[['median_income']] # Feature (median income)
y = data['median_house_value'] # Target (median house value)

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Linear Regression model


model = LinearRegression()

# Train the model


model.fit(X_train, y_train)
# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print the results


print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

# Plot the results


plt.scatter(X_test, y_test, color='blue', label='Actual values')
plt.plot(X_test, y_pred, color='red', label='Predicted values')
plt.xlabel('Median Income')
plt.ylabel('Median House Value')
plt.title('Linear Regression on California Housing Data')
plt.legend()
plt.show()

Output:
Logistic Regression Code:

# Step 1 – Import Libraries


import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt

# Step 2 – Load and Preprocess Dataset


# Load the dataset
data = pd.read_csv("california_housing_test.csv")

# Handle missing values (if any)


data = data.dropna()

# Select the feature (e.g., median_income) and convert the target to binary classification
X = data[['median_income']] # Feature
y = (data['median_house_value'] > 200000).astype(int) # Binary target: 1 if house value > 200000,
else 0

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3 – Define and Train the Logistic Regression Model


# Initialize the Logistic Regression model
log_reg_model = LogisticRegression()

# Train the model on the training set


log_reg_model.fit(X_train, y_train)

# Step 4 – Make Predictions and Evaluate the Model


# Make predictions on the test data
y_pred = log_reg_model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Print the evaluation metrics


print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

# Step 5 – Visualization (Optional)


# Optional: Visualize the confusion matrix
plt.figure(figsize=(6,6))
plt.imshow(conf_matrix, cmap='Blues', interpolation='nearest')
plt.title('Confusion Matrix')
plt.colorbar()
plt.xlabel('Predicted')
plt.ylabel('True')
plt.xticks([0, 1], ['Below 200k', 'Above 200k'])
plt.yticks([0, 1], ['Below 200k', 'Above 200k'])
plt.show()

Output:

Result:
The linear regression and logistic regression were implemented using Python code.
Exp. No.: 6 Build decision trees and random forests

Aim:
a. To implement the decision tree model using Python code
b. To implement the random forest model using Python code

Algorithm:
Step 1 – Import Libraries
Step 2 – Load and Preprocess Dataset
Step 3 – Split the Dataset
Step 4 – Define the Model
Step 5 – Train the Model
Step 6 – Validate the Model
Step 7 – Perform Inference

Decision Tree Code:


# Step 1 – Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load iris dataset


iris = pd.read_csv("iris.csv")

# Features and target


X = iris.drop('species', axis=1)
y = iris['species']
# Encode class labels if they are strings
le = LabelEncoder()
y_encoded = le.fit_transform(y)
# Step 3 – Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4 – Define the Decision Tree Model


model = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)

# Step 5 – Train the Model


model.fit(X_train, y_train)

# Step 6 – Validate the Model


y_pred = model.predict(X_test)
# Step 7 – Evaluate the Model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Step 7 – Visualize the Tree


plt.figure(figsize=(12, 8))
plot_tree(model,
feature_names=X.columns,
class_names=le.classes_,
filled=True)
plt.title("Decision Tree Visualization")
plt.show()

Output:
Random Forest:

# Step 1 – Import Libraries


import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Step 2 – Load Dataset


df = pd.read_csv("iris.csv") # Replace with your actual file path

# Features and target


X = iris.drop('species', axis=1)
y = iris['species']

# Encode class labels if they are strings


le = LabelEncoder()
y_encoded = le.fit_transform(y)
# Step 4 – Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3, random_state=42)

# Step 5 – Define and Train the Random Forest Model


model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)

# Step 6 – Evaluate the Model


y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Plot Feature Importance


importances = model.feature_importances_
features = X.columns
sns.barplot(x=importances, y=features)
plt.title("Feature Importance in Random Forest")
plt.show()

Output:
Result:
The decision tree and random forest algorithms were implemented using Python code.
Exp. No.: 7 BUILD SUPPORT VECTOR MACHINE MODELS

Aim:
To implement the support vector machine model in machine learning

Algorithm:

Step 1 – Import Libraries


Step 2 – Load and Preprocess Dataset
Step 3 – Split the Dataset
Step 4 – Define the SVC Model
Step 5 – Train the Model
Step 6 – Evaluate the Model
Step 7 – Perform Inference
Step 8 (Optional) – Hyperparameter Tuning

Support Vector Machine:

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset


data = pd.read_csv("california_housing_test.csv") # Make sure the path to your CSV file is
correct

# Inspect the columns to identify the target and feature


print(data.columns)

# Now using 'median_income' as the feature and 'median_house_value' as the target


X = data[['median_income']] # Feature (median income)
y = data['median_house_value'] # Target (median house value)

# Step 4: Scale features


scaler_X = StandardScaler()
X_scaled = scaler_X.fit_transform(X)
# Step 5: Scale target values
scaler_y = StandardScaler()
y_scaled = scaler_y.fit_transform(y.values.reshape(-1, 1)).flatten()

# Step 6: Train-test split


X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y_scaled, test_size=0.2, random_state=42
)

# Step 7: Train SVR model (using RBF kernel)


svr = SVR(C=10, epsilon=0.2, kernel='rbf')
svr.fit(X_train, y_train)

# Step 8: Predict and inverse scale the target


y_pred_scaled = svr.predict(X_test)
y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).flatten()
y_test_actual = scaler_y.inverse_transform(y_test.reshape(-1, 1)).flatten()

# Step 9: Evaluate the model


r2 = r2_score(y_test_actual, y_pred)
mse = mean_squared_error(y_test_actual, y_pred)

print(f"R² Score: {r2:.4f}")


print(f"Mean Squared Error: {mse:.4f}")

# Step 10: Plot predicted vs actual


plt.figure(figsize=(8, 6))
plt.scatter(y_test_actual, y_pred, alpha=0.4, color='green')
plt.plot([y_test_actual.min(), y_test_actual.max()],
[y_test_actual.min(), y_test_actual.max()], 'k--')
plt.xlabel("Actual House Value")
plt.ylabel("Predicted Value")
plt.title("SVR Prediction (California Housing)")
plt.grid(True)
plt.show()

Output:
Support Vector Classification:

# Step 1 – Import libraries


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder, label_binarize
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Step 2 – Load data
df = pd.read_csv("iris.csv") # Replace with your CSV file

# Step 3 – Features and target


X = df.drop("species", axis=1)
y = df["species"]

# Step 4 – Label encode and binarize target


le = LabelEncoder()
y_encoded = le.fit_transform(y)
y_bin = label_binarize(y_encoded, classes=[0, 1, 2]) # Needed for ROC

# Step 5 – Split dataset


X_train, X_test, y_train_bin, y_test_bin = train_test_split(X, y_bin, test_size=0.3,
random_state=42)

# Step 6 – Scale features (important for SVM)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 7 – Train One-vs-Rest SVC


classifier = OneVsRestClassifier(SVC(kernel='rbf', probability=True, random_state=42))
classifier.fit(X_train, y_train_bin)

# Step 8 – Predict probabilities


y_score = classifier.predict_proba(X_test)

# Step 9 – Compute ROC curve and AUC for each class


fpr = dict()
tpr = dict()
roc_auc = dict()
n_classes = y_bin.shape[1]

for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test_bin[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])

# Step 10 – Plot all ROC curves


plt.figure(figsize=(8, 6))
colors = ['blue', 'green', 'red']
for i in range(n_classes):
plt.plot(fpr[i], tpr[i], color=colors[i], lw=2,
label=f"ROC curve for {le.classes_[i]} (AUC = {roc_auc[i]:.2f})")

plt.plot([0, 1], [0, 1], 'k--', lw=2)


plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Multiclass ROC Curve (Iris Dataset)")
plt.legend(loc="lower right")
plt.grid()
plt.show()
# Step 11 – Predict class labels (not probabilities)
y_test_labels = np.argmax(y_test_bin, axis=1)
y_pred_labels = classifier.predict(X_test)
y_pred_labels = np.argmax(y_pred_labels, axis=1)

# Step 12 – Accuracy and classification report


print("Accuracy Score:", accuracy_score(y_test_labels, y_pred_labels))
print("\nClassification Report:\n", classification_report(y_test_labels, y_pred_labels,
target_names=le.classes_))

# Step 13 – Confusion matrix


conf_matrix = confusion_matrix(y_test_labels, y_pred_labels)
print("\nConfusion Matrix:\n", conf_matrix)

Output:
RESULT:
The support vector machine was implemented using Python code.
Exp. No.: 8 IMPLEMENT ENSEMBLING TECHNIQUES

AIM:
To provide a Python code to demonstrate the use of ensemble techniques in machine learning

Algorithm:
1. Import Necessary Libraries
2. Load and Understand the Dataset
3. Preprocess the Data
4. Split the Dataset
5. Select an Ensemble Method
A. Bagging (Bootstrap Aggregating)
B. Boosting
C. Stacking
6. Train the Ensemble Model
7. Make Predictions
8. Evaluate the Model
9. Tune Hyperparameters (Optional)
10. Deploy or Use the Model

Ensembling Code:
# Step 1 – Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier,
StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 2 – Load Dataset


df = pd.read_csv('tips.csv')

# Step 3 – Create Target Variable: 1 if tip > average tip, else 0


df['high_tip'] = (df['tip'] > df['tip'].mean()).astype(int)

# Step 4 – Encode Categorical Columns


le = LabelEncoder()
df['sex'] = le.fit_transform(df['sex'])
df['smoker'] = le.fit_transform(df['smoker'])
df['day'] = le.fit_transform(df['day'])
df['time'] = le.fit_transform(df['time'])

# Step 5 – Define Features and Target


X = df[['total_bill', 'sex', 'smoker', 'day', 'time', 'size']]
y = df['high_tip']

# Step 6 – Train-Test Split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 7 – Scale Features


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 8 – Bagging (Random Forest)


rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)

# Step 9 – Boosting (Gradient Boosting)


gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
gb.fit(X_train, y_train)
gb_pred = gb.predict(X_test)

# Step 10 – Stacking
base_learners = [
('dt', DecisionTreeClassifier(random_state=42)),
('svm', SVC(probability=True, random_state=42))
]
meta_learner = LogisticRegression()
stack = StackingClassifier(estimators=base_learners, final_estimator=meta_learner)
stack.fit(X_train, y_train)
stack_pred = stack.predict(X_test)

# Step 11 – Evaluate All Models


print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))
print("Gradient Boosting Accuracy:", accuracy_score(y_test, gb_pred))
print("Stacking Accuracy:", accuracy_score(y_test, stack_pred))

print("\nClassification Report for Stacking:\n", classification_report(y_test, stack_pred))


print("Confusion Matrix for Stacking:\n", confusion_matrix(y_test, stack_pred))

Output:
RESULT:

Python code to demonstrate the use of ensemble techniques in machine learning was implemented.
Exp. No.: 9 IMPLEMENT CLUSTERING ALGORITHMS

AIM:
To implement the clustering algorithm to group data points based on certain similarities.

Algorithm:
1. Import Required Libraries
2. Load the Dataset
3. Preprocess the Data
4. Select Clustering Algorithm
5. Choose Number of Clusters (if required)
6. Fit the Clustering Algorithm
7. Analyze Clustering Results
8. Visualize Clusters
9. Evaluate Clustering Quality
10. Interpret and Use Clusters

Code for Clustering:


# Step 1 – Import Libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import seaborn as sns

# Step 2 – Load Dataset


df = pd.read_csv('diabetes.csv') # Make sure 'diabetes.csv' is in your working directory

# Step 3 – Preprocess the Data


# Drop target variable for clustering
X = df.drop(columns=['Outcome'])

# Normalize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 4 – KMeans Clustering


# Elbow Method to find optimal k
inertia = []
K = range(1, 11)
for k in K:
km = KMeans(n_clusters=k, random_state=42)
km.fit(X_scaled)
inertia.append(km.inertia_)

# Plot Elbow Curve


plt.figure(figsize=(8, 5))
plt.plot(K, inertia, 'bo-')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal k')
plt.grid(True)
plt.show()

# Fit KMeans with k=2


kmeans = KMeans(n_clusters=2, random_state=42)
kmeans_labels = kmeans.fit_predict(X_scaled)
df['KMeans_Cluster'] = kmeans_labels

# Step 5 – DBSCAN Clustering


dbscan = DBSCAN(eps=1.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(X_scaled)
df['DBSCAN_Cluster'] = dbscan_labels

# Step 6 – Evaluation
# KMeans Silhouette Score
print("KMeans Silhouette Score:", silhouette_score(X_scaled, kmeans_labels))

# DBSCAN Silhouette Score (excluding noise)


mask = df['DBSCAN_Cluster'] != -1
X_dbscan = X_scaled[mask]
labels_dbscan = dbscan_labels[mask]

if len(set(labels_dbscan)) > 1:
print("DBSCAN Silhouette Score (excluding noise):", silhouette_score(X_dbscan,
labels_dbscan))
else:
print("DBSCAN clustering did not find multiple clusters (excluding noise).")

# Step 7 – Visualization

# KMeans Cluster Visualization


plt.figure(figsize=(8, 5))
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=kmeans_labels, palette='Set1')
plt.title("KMeans Clustering (2 Features)")
plt.xlabel("Feature 1 (scaled)")
plt.ylabel("Feature 2 (scaled)")
plt.show()
# DBSCAN Cluster Visualization
plt.figure(figsize=(8, 5))
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=dbscan_labels, palette='Set2')
plt.title("DBSCAN Clustering (2 Features)")
plt.xlabel("Feature 1 (scaled)")
plt.ylabel("Feature 2 (scaled)")
plt.show()

Output:
Result:
The K-means and DBSCAN clustering algorithms was implemented, and similar points were
grouped.
Exp. No.: 10 IMPLEMENT EM FOR BAYESIAN NETWORKS

AIM:
To implement the Expectation Maximization (EM) algorithm for Bayesian networks

Algorithm:
Step 1 – Define the Structure of the Bayesian Network
Step 2 – Initialize the Parameters
Step 3 – Expectation Step (E-Step)
Step 4 – Maximization Step (M-Step)
Step 5 – Check for Convergence
Step 6 – Use the Learned Model

Expectation Maximization (EM) Algorithm Code:

import pandas as pd
import numpy as np
from pgmpy.models import DiscreteBayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination

# Step 1 – Create dataset with missing values


data = pd.DataFrame([
['yes', 'yes', 'yes'],
['yes', 'no', 'yes'],
['no', 'yes', 'yes'],
['no', 'no', 'no'],
['yes', None, 'yes'],
[None, 'yes', 'yes'],
['no', None, 'no'],
], columns=['Rain', 'Sprinkler', 'WetGrass'])

# Step 2 – Convert categorical data to numeric with consistent labels


mapping = {'no': 0, 'yes': 1}
data_encoded = data.replace(mapping)

# Convert to float to handle NaNs


data_encoded = data_encoded.astype('float')

# Step 3 – Define the Bayesian Network structure


model = DiscreteBayesianNetwork([('Rain', 'Sprinkler'), ('Sprinkler', 'WetGrass')])

# Step 4 – Use MaximumLikelihoodEstimator to learn parameters


model.fit(data_encoded, estimator=MaximumLikelihoodEstimator)
# Step 5 – Inference on learned model
inference = VariableElimination(model)

# Step 6 – Query example: P(Rain | WetGrass = yes)


result = inference.query(variables=['Rain'], evidence={'WetGrass': 1})
print("P(Rain | WetGrass = yes):\n", result)

# Step 7 – Display learned CPDs


for cpd in model.get_cpds():
print(f"\nCPD of {cpd.variable}:\n{cpd}")

Output:
RESULT:

The Expectation Maximization (EM) algorithm for Bayesian networks was implemented, and the
result was verified.
Exp. No.: 11 BUILD SIMPLE NN MODELS

AIM:
To implement the gradient descent model (NN Model) to find the local minimum

Algorithm:
Step 1: Initialize Parameters (Weights and Biases)
Step 2: Choose a Loss Function
Step 3: Choose an Optimization Method (Gradient Descent)
Step 4: Forward Propagation
Step 5: Calculate the Loss
Step 6: Backpropagation (Calculate Gradients)
Step 7: Update Parameters Using Gradient Descent
Step 8: Repeat the Process (Iterate)
Step 9: Monitor the Loss and Adjust Learning Rate
Step 10: Evaluate the Model

Gradient Descent Model (NN Model) Code:


import numpy as np
import matplotlib.pyplot as plt

# Generate random data for a linear relationship (you can replace this with your own data)
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Add a bias term to the input features (X_b = [1, X])


X_b = np.c_[np.ones((100, 1)), X] # Add a column of ones to X for the bias term

# Initialize random coefficients (theta) for both intercept and coefficient


theta = np.random.randn(2, 1)

# Set hyperparameters
learning_rate = 0.01
num_iterations = 1000

# Gradient Descent
for iteration in range(num_iterations):
# Calculate predictions
predictions = X_b.dot(theta)

# Calculate errors
errors = predictions - y

# Calculate gradients
gradients = X_b.T.dot(errors) / len(X_b)

# Update parameters using gradients and learning rate


theta -= learning_rate * gradients

# Print the learned parameters


print("Learned Parameters:")
print("Intercept:", theta[0][0])
print("Coefficient:", theta[1][0])

# Plot the original data and the learned regression line


plt.scatter(X, y, label='Data')
plt.plot(X, X_b.dot(theta), color='red', label='Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Output:

RESULT:
The gradient descent model was implemented, and local minima, intercepts, and coefficients were
found.
Exp. No.: 12 BUILD DEEP LEARNING NN MODELS

AIM:
To implement the rectified linear unit (ReLU) function to solve the vanishing gradients issue to
build deep learning NN models

Algorithm:
1. Import Required Libraries
2. Generate or Load Dataset
3. Split the Dataset
4. Initialize the Model
5. Add Layers to the Model
6. Compile the Model
7. Train the Model:
8. Evaluate the Model:
9. Make Predictions:
10. Visualize Training Performance

Code for Deep Learning Model:

pip install tensorflow


import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons # For generating toy dataset (binary classification)

# Step 1: Generate a toy dataset (you can use your own dataset here)
X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)

# Step 2: Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Build the deep learning neural network model


model = Sequential()

# Input Layer
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu')) # First hidden layer

# Additional Hidden Layers


model.add(Dense(128, activation='relu')) # Second hidden layer
model.add(Dropout(0.2)) # Dropout for regularization
model.add(Dense(64, activation='relu')) # Third hidden layer

# Output Layer (Binary Classification)


model.add(Dense(1, activation='sigmoid')) # Output layer with sigmoid activation

# Step 4: Compile the model


model.compile(optimizer=Adam(learning_rate=0.01),
loss='binary_crossentropy', # Loss function for binary classification
metrics=['accuracy']) # Metrics to evaluate the model

# Step 5: Train the model


history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

# Step 6: Evaluate the model on test data


test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_accuracy}')

# Step 7: Make predictions


predictions = model.predict(X_test)
predictions = (predictions > 0.5).astype(int) # Convert probabilities to binary (0 or 1)

# Display some predictions


print(f"Predictions: {predictions[:10]}")
print(f"True Labels: {y_test[:10]}")

# Optional: Plot training history


import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc='upper left')
plt.show()

Output:
Result:

ReLU function was implemented to address vanishing gradient issues in the form of deep neural
network.

You might also like