0% found this document useful (0 votes)
40 views131 pages

ML

Uploaded by

kiruthikaa T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views131 pages

ML

Uploaded by

kiruthikaa T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 131

Experiment no:1-a DATASET CREATION

Date: CATEGORICAL DATASET

Aim:
The Objective of the experiment is to create a categorical dataset using numpy, pandas
package in python.

Algorithm:
Step 1: Start

Step 2: Import the `csv` module

Step 3: Define the file path for the CSV file

Step 4: Open the CSV file in write mode and define field names

Step 5: Write the header row to the CSV file

Step 6: Begin a loop to collect user data

Step 7: Prompt the user to enter their name

Step 8: Check if the user wants to exit

Step 9: If not, prompt the user to enter their age, height, and weight

Step 10: Calculate BMI using the provided formula

Step 11: Categorize the BMI into one of four categories

Step 12: Write the user's data (name, age, height, weight, BMI, BMI category) to the CSV file

Step 13: Repeat the loop until the user chooses to exit

Step 14: Display a message indicating that the data has been written to the CSV file

Step 15: Stop

Program:

import csv

csv_file_path = 'data_with_bm.csv'

# Open the CSV file for writing


with open(csv_file_path, 'w', newline='') as csvfile:
fieldnames = ['Name', 'Age', 'Height', 'Weight', 'BMI', 'BMI_Category']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

# Write header
writer.writeheader()
Output:

Enter your name (or type 'exit' to finish): akalya

Enter your age: 20

Enter your height in meters: 1.70

Enter your weight in kilograms: 54

Enter your name (or type 'exit' to finish): nandhini

Enter your age: 20

Enter your height in meters: 1.56

Enter your weight in kilograms: 45

Enter your name (or type 'exit' to finish): priya

Enter your age: 34

Enter your height in meters: 1.55

Enter your weight in kilograms: 22

Enter your name (or type 'exit' to finish): vinu

Enter your age: 20

Enter your height in meters: 1.78

Enter your weight in kilograms: 170

Enter your name (or type 'exit' to finish): exit

Data has been written to data_with_bm.csv


while True:
# Get user input for each entry
name = input("Enter your name (or type 'exit' to finish): ")

if name.lower() == 'exit':
break

age = int(input("Enter your age: "))


height = float(input("Enter your height in meters: "))
weight = float(input("Enter your weight in kilograms: "))

if height <= 0 or weight <= 0:


print("Height and weight must be positive numbers. Please try again.")
continue

# Calculate BMI
bmi = weight / (height ** 2)

# Categorize BMI
if bmi < 18.5:
bmi_category = 'Underweight'
elif 18.5 <= bmi < 24.9:
bmi_category = 'Normal weight'
elif 24.9 <= bmi < 29.9:
bmi_category = 'Overweight'
else:
bmi_category = 'Obese'

# Write user data to CSV


writer.writerow({'Name': name, 'Age': age, 'Height': height, 'Weight': weight, 'BMI': bmi,
'BMI_Category': bmi_category})

print(f"Data has been written to {csv_file_path}.")

Result:

Thus the categorical dataset using numpy, pandas package in python has been created successfully.
Experiment DATASET CREATION
no:1-b CONTINUOUS DATASET
Date:

Aim:
The Objective of the experiment is to create a categorical dataset using numpy, pandas package
in python.

Algorithm:
Step 1: Start

Step 2: Import the `csv` module

Step 3: Define the file path for the CSV file

Step 4: Open the CSV file in write mode and define field names (Name, Age, Height, Weight, BMI)

Step 5: Write the header row to the CSV file

Step 6: Begin a loop to collect user data

Step 7: Prompt the user to enter their name

Step 8: Check if the user wants to exit

Step 9: If not, prompt the user to enter their age, height, and weight

Step 10: Calculate BMI using the formula: weight / (height ** 2)

Step 11: Write the user's data (name, age, height, weight, BMI) to the CSV file

Step 12: Repeat the loop until the user chooses to exit

Step 13: Display a message indicating that the data has been written to the CSV file

Step 14: Stop

PROGRAM:

import csv

csv_file_path = 'bmi.csv'

# Open the CSV file for writing


with open(csv_file_path, 'w', newline='') as csvfile:
fieldnames = ['Name', 'Age', 'Height', 'Weight', 'BMI']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
OUTPUT:

Enter your name (or type 'exit' to finish): akalya

Enter your age: 20

Enter your height in meters: 1.69

Enter your weight in kilograms: 55

Enter your name (or type 'exit' to finish): nandhini

Enter your age: 22

Enter your height in meters: 1.55

Enter your weight in kilograms: 34

Enter your name (or type 'exit' to finish): priya

Enter your age: 25

Enter your height in meters: 1.70

Enter your weight in kilograms: 110

Enter your name (or type 'exit' to finish): exit

Data has been written to bmi.csv


writer.writeheader()

while True:
# Get user input for each entry
name = input("Enter your name (or type 'exit' to finish): ")

if name.lower() == 'exit':
break

age = int(input("Enter your age: "))


height = float(input("Enter your height in meters: "))
weight = float(input("Enter your weight in kilograms: "))

# Calculate BMI
bmi = weight / (height ** 2)

# Write user data to CSV


writer.writerow({'Name': name, 'Age': age, 'Height': height, 'Weight': weight, 'BMI': bmi})

print(f"Data has been written to {csv_file_path}.")

RESULT:

Thus the python program to create dataset is executed and verified successfully.
Experiment no:2-a SIMPLE LINEAR REGRESSION USING
Date: LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement simple linear regression using library functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries

Step 3: Load dataset and split into training and testing sets

Step 4: Train Linear Regression model

Step 5: Make predictions and define a threshold for binary classification

Step 6: Evaluate model performance

Step 7: Prompt user for input (years of experience)

Step 8: Predict salary and convert to binary label

Step 9: Print predicted salary

Step 10: Plot regression line with data points and threshold

Step 11: Stop

Program:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, precision_recall_fscore_support,
accuracy_score
import matplotlib.pyplot as plt

# Load the dataset from CSV file


file_path = "Salary_Data.csv" # Update this with your CSV file path
data = pd.read_csv(file_path)

# Assuming 'years_of_experience' is the feature and 'salary' is the target variable


X = data[['YearsExperience']].values
y = data['Salary'].values
SAMPLE DATASET:

OUTPUT:

Precision: 1.0

Recall: 0.6666666666666666

F1 Score: 0.8

Accuracy: 0.8333333333333334

Enter years of experience:4

Predicted Salary: 63016.84430390072

Text(0.5, 1.0, 'Linear Regression: Years of Experience vs Salary')


# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training the linear regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = model.predict(X_test)

# Set the threshold for binary classification


threshold = 80000 # Adjust this threshold as needed

# Convert predictions to binary labels based on the threshold


y_pred_binary = (y_pred > threshold).astype(int)
y_test_binary = (y_test > threshold).astype(int)

# Evaluate the model using classification metrics


precision, recall, f1, _ = precision_recall_fscore_support(y_test_binary, y_pred_binary,
average='binary')
accuracy = accuracy_score(y_test_binary, y_pred_binary)

print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Accuracy:", accuracy)

# Get input from the user for Years of Experience


years_of_experience = float(input("Enter years of experience:"))

# Create a NumPy array with the user input


years_of_experience_array = np.array([[years_of_experience]])

# Predict salary using the trained linear regression model


predicted_salary = model.predict(years_of_experience_array)

# Convert the predicted salary to binary label based on the threshold


predicted_salary_binary = (predicted_salary > threshold).astype(int)

print("Predicted Salary:", predicted_salary[0])

# Plotting the regression line along with test data points


plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.axhline(y=threshold, color='red', linestyle='--', label='Threshold')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.title('Linear Regression: Years of Experience vs Salary')

Result:

Thus the simple linear Regression with built in functionhas been implemented and executed
successfully.
Experiment no:2-b SIMPLE LINEAR REGRESSION WITHOUT
Date: USING LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement simple linear regression without using library
functions.

Algorithm:
Step 1: Start

Step 2: Import the matplotlib.pyplot library

Step 3: Load the dataset from the CSV file specified in 'file_path'

Step 4: Read the CSV file without using pandas, extracting lines of data

Step 5: Extract data from CSV and format it as a list of lists

Step 6: Define features ('years_of_experience') and target variable ('salary')

Step 7: Split the data into training and testing sets

Step 8: Train the linear regression model using analytical solution

Step 9: Make predictions on the test set

Step 10: Evaluate the model's performance using Mean Squared Error (MSE) and R-squared (R^2)
Score

Step 11: Prompt the user to enter years of experience for predicting salary

Step 12: Predict salary using the trained linear regression model

Step 13: Print the predicted salary

Step 14: Plot the regression line along with test data points

Step 15: Show the plot

Step 16: Stop

Program:
import matplotlib.pyplot as plt

# Load the dataset from CSV file


file_path = "Salary_Data.csv" # Update this with your CSV file path

# Read the CSV file without using pandas


with open(file_path, 'r') as file:
lines = file.readlines()

# Extract data from CSV


data = [list(map(float, line.strip().split(','))) for line in lines[1:]] # Skip header

# Assuming 'years_of_experience' is the feature and 'salary' is the target variable


X = [[1, row[0]] for row in data] # Add a bias term
y = [row[1] for row in data]

# Splitting the data into training and testing sets


split_index = int(0.8 * len(data))
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Training the linear regression model


mean_X1 = sum(x[1] for x in X_train) / len(X_train)
mean_y = sum(y_train) / len(y_train)

b1 = sum((x[1] - mean_X1) * (y_train[i] - mean_y) for i, x in enumerate(X_train)) / sum((x[1] -


mean_X1)**2 for x in X_train)
b0 = mean_y - b1 * mean_X1

# Making predictions on the test set


y_pred = [b0 + b1 * x[1] for x in X_test]

# Evaluating the model


n = len(y_test)
mse = sum((y_test[i] - y_pred[i])**2 for i in range(n)) / n
mean_y_test = sum(y_test) / n
ss_total = sum((y_test[i] - mean_y_test)**2 for i in range(n))
r2 = 1 - (mse / ss_total)

print("Mean Squared Error:", mse)


print("R^2 Score:", r2)

# Get input from the user for Hours Studied


years_of_experience = float(input("Enter year of experience: "))

# Predict exam scores using the trained linear regression model


predicted_salary = b0 + b1 * years_of_experience

print("Predicted salary:", predicted_salary)

# Plotting the regression line along with test data points


plt.scatter([x[1] for x in X_test], y_test, color='black')
SAMPLE DATASET:

Output:
Mean Squared Error: 35766738.2396581
R^2 Score: 0.8450481599189925
Enter year of experience: 4
Predicted salary: 63870.993342864815

plt.plot([x[1] for x in X_test], y_pred, color='blue', linewidth=3)


plt.xlabel('year of experience')
plt.ylabel('salary')
plt.title('Linear Regression: year of experience vs salary')
plt.show()

Result:
Thus the simple linear Regression without built in functionhas been implemented and executed
successfully.
Experiment no:2-c MULTI LINEAR REGRESSION USING
Date: LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement Multi linear regression using library
functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries

Step 3: Load dataset from CSV file

Step 4: Split data into training and testing sets

Step 5: Train Linear Regression model

Step 6: Make predictions on the test set

Step 7: Evaluate model performance

Step 8: Prompt user for input (TV, Radio, Newspaper budgets)

Step 9: Predict sales using the trained model

Step 10: Print predicted sales

Step 11: Plot actual vs predicted sales

Step 12: Stop

Program:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load the dataset from CSV file


file_path = "Advertising.csv" # Update this with your CSV file path
data = pd.read_csv(file_path)

# Assuming 'TV', 'Radio', and 'Newspaper' are features and 'Sales' is the target variable
X = data[['TV', 'Radio', 'Newspaper']].values
y = data['Sales'].values
SAMPLE DATASET:

Output:
Mean Squared Error: 2.9077569102710923
R^2 Score: 0.9059011844150826
Enter TV advertising budget: 6677
Enter Radio advertising budget: 6488
Enter Newspaper advertising budget: 3566
Predicted Sales: 1039.0705215552014
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training the linear regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = model.predict(X_test)

# Evaluating the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)


print("R^2 Score:", r2)

# Get input from the user for Radio, TV, and Newspaper advertising budgets
tv_budget = float(input("Enter TV advertising budget: "))
radio_budget = float(input("Enter Radio advertising budget: "))
newspaper_budget = float(input("Enter Newspaper advertising budget: "))

# Create a NumPy array with the user input


user_input = np.array([[tv_budget, radio_budget, newspaper_budget]])

# Predict sales using the trained linear regression model


predicted_sales = model.predict(user_input)

print("Predicted Sales:", predicted_sales[0])

# Plotting actual vs predicted sales on the test set


plt.scatter(y_test, y_pred, color='blue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linewidth=3)
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()

Result:
Thus the multiple linear Regression with built in function has been implemented and executed
successfully.
Experiment no:2-d MULTI LINEAR REGRESSION WITHOUT
Date: USING LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement Multi linear regression without using library
functions.

Algorithm:
Step 1: Start

Step 2: Import the matplotlib.pyplot library

Step 3: Load the dataset from the CSV file specified in 'file_path'

Step 4: Read the CSV file without using pandas, extracting lines of data

Step 5: Extract data from CSV and format it as a list of lists

Step 6: Define features ('TV', 'Radio', 'Newspaper') and target variable ('Sales')

Step 7: Split the data into training and testing sets

Step 8: Create polynomial features for training and testing sets

Step 9: Train the linear regression model with polynomial features

Step 10: Make predictions on the test set with polynomial features

Step 11: Evaluate the model's performance using Mean Squared Error (MSE) and R-squared (R^2)
Score

Step 12: Prompt the user to enter advertising budgets for TV, Radio, and Newspaper

Step 13: Predict sales using the trained model

Step 14: Print the predicted sales

Step 15: Plot actual vs predicted sales on the test set

Step 16: Stop

Program:
import matplotlib.pyplot as plt

# Load the dataset from CSV file


file_path = "Advertising.csv" # Update this with your CSV file path

# Read the CSV file without using pandas


with open(file_path, 'r') as file:
lines = file.readlines()

# Extract data from CSV


data = []
for line in lines[1:]: # Skip the header line
values = line.strip().split(',')
data.append([float(values[0]), float(values[1]), float(values[2]), float(values[3])])

# Assuming 'TV', 'Radio', and 'Newspaper' are features and 'Sales' is the target variable
X = [[row[0], row[1], row[2]] for row in data] # Features: TV, Radio, Newspaper
y = [row[3] for row in data] # Target variable: Sales

# Splitting the data into training and testing sets


split_index = int(0.8 * len(data))
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Create polynomial features


X_poly_train = [[1, x[0], x[1], x[2], x[0] * x[0], x[1] * x[1], x[2] * x[2]] for x in X_train]
X_poly_test = [[1, x[0], x[1], x[2], x[0] * x[0], x[1] * x[1], x[2] * x[2]] for x in X_test]

# Training the linear regression model with polynomial features


X_poly_train_transposed = list(map(list, zip(*X_poly_train)))
mean_X_poly = [sum(feature) / len(feature) for feature in X_poly_train_transposed]
mean_y_poly = sum(y_train) / len(y_train)

# Add a small constant to avoid division by zero


epsilon = 1e-10

# Compute coefficients with added constant


coefficients_poly = [
sum((X_poly_train[i][j] - mean_X_poly[j]) * (y_train[i] - mean_y_poly) for i in
range(len(X_poly_train))) /
(sum((X_poly_train[i][j] - mean_X_poly[j])**2 for i in range(len(X_poly_train))) + epsilon)
for j in range(len(mean_X_poly))
]

# Making predictions on the test set with polynomial features


y_pred_poly = [sum(coefficients_poly[k] * x[k] for k in range(len(coefficients_poly))) for x in
X_poly_test]

# Evaluating the model with polynomial features


n = len(y_test) # Define 'n' as the length of the test set
mse_poly = sum((y_test[i] - y_pred_poly[i])**2 for i in range(n)) / n
mean_y_test_poly = sum(y_test) / n
ss_total_poly = sum((y_test[i] - mean_y_test_poly)**2 for i in range(n))
r2_poly = 1 - (mse_poly / ss_total_poly)

print("Mean Squared Error (Polynomial Regression):", mse_poly)


print("R^2 Score (Polynomial Regression):", r2_poly)
SAMPLE DATASET:

Output:
Mean Squared Error (Polynomial Regression): 59.046372645941645
R^2 Score (Polynomial Regression): 0.9454034508470807
Enter TV advertising budget: 5466
Enter Radio advertising budget: 3123
Enter Newspaper advertising budget:3424
Predicted Sales: 34212.556222982384
# Get input from the user for Radio, TV, and Newspaper advertising budgets
tv_budget = float(input("Enter TV advertising budget: "))
radio_budget = float(input("Enter Radio advertising budget: "))
newspaper_budget = float(input("Enter Newspaper advertising budget:"))

# Predict sales using the trained linear regression model


user_input = [1, tv_budget, radio_budget, newspaper_budget, tv_budget * tv_budget, radio_budget
* radio_budget, newspaper_budget * newspaper_budget]
predicted_sales = sum(coefficients_poly[k] * user_input[k] for k in range(len(coefficients_poly)))

print("Predicted Sales:", predicted_sales)

sorted_indices = sorted(range(len(y_test)), key=lambda k: y_test[k]) # Sort based on the actual sales


values
y_test_sorted = [y_test[i] for i in sorted_indices]
y_pred_poly_sorted = [y_pred_poly[i] for i in sorted_indices]

plt.scatter(y_test_sorted, y_pred_poly_sorted, color='blue') # Scatter plot


plt.plot([min(y_test_sorted), max(y_test_sorted)], [min(y_test_sorted), max(y_test_sorted)],
color='red', linewidth=3) # Linear line
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()

Result:
Thus multiple linear regression without using inbuilt function has been implemented and executed
successfully.
Experiment no:2-e IMPLEMENTATION OF LOGISTIC
Date: REGRESSION

Aim:
The Objective of this experiment is to implement logistic regression using library functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, sklearn's train_test_split, LogisticRegression,


accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, LabelEncoder, and
matplotlib.pyplot

Step 3: Load the dataset from the CSV file 'optimal_features_dataset.csv' using pandas

Step 4: Drop the 'url' column from the dataset

Step 5: Encode the categorical variable 'status' using LabelEncoder

Step 6: Drop the original 'status' column and separate features (X) and target variable (y)

Step 7: Split the dataset into training and testing sets

Step 8: Initialize Logistic Regression model

Step 9: Train the model on the training set

Step 10: Predict on the test set

Step 11: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 12: Plot the distribution of the ratio of digits in the URL for legitimate and phishing URLs

Step 13: Show the plot

Step 14: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('optimal_features_dataset.csv')
df.drop('url', axis=1, inplace=True)
# Encoding categorical variable 'status'
label_encoder = LabelEncoder()
df['status_encoded'] = label_encoder.fit_transform(df['status'])

# Drop the original 'status' column


df.drop('status', axis=1, inplace=True)

# Separate features and target variable


X = df.drop('status_encoded', axis=1)
y = df['status_encoded']

# Split dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Logistic Regression model


model = LogisticRegression()

# Train the model


model.fit(X_train, y_train)

# Predict on the test set


y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)
df.head()

plt.figure(figsize=(8, 6))
legitimate_data = df[df['status_encoded'] == 0]
phishing_data = df[df['status_encoded'] == 1]
plt.hist(legitimate_data['ratio_digits_url'], bins=20, alpha=0.5, label='Legitimate URLs')
plt.hist(phishing_data['ratio_digits_url'], bins=20, alpha=0.5, label='Phishing URLs')
plt.xlabel('Ratio of Digits in URL')
plt.ylabel('Frequency')
plt.title('Distribution of Ratio of Digits in URL')
plt.legend()
plt.show()
SAMPLE DATASET:

Output:
Accuracy: 0.6399825021872266
Precision: 0.7101648351648352
Recall: 0.4579273693534101
F1 Score: 0.5568120624663435
Confusion Matrix:
[[946 211]
[612 517]]
Result:
Thus the experiment is to implement logistic regression using library functions has been
implemented and executed successfully.
Experiment no:3-a NAIVE BAYES ALGORITHM WITH
Date: LIBRARY FUNCION

Aim:
The Objective of this experiment is to implement Naïve Bayes algorithm with library functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, numpy, sklearn's CategoricalNB, LabelEncoder,


accuracy_score, and classification_report

Step 3: Load dataset from CSV file 'naive.csv' using pandas

Step 4: Separate data into features (X) and labels (y)

Step 5: Define a custom function to encode boolean values as 1 or 0

Step 6: Use LabelEncoder for all features, including the boolean one, and encode categorical
variables

Step 7: Train a Naive Bayes classifier (CategoricalNB)

Step 8: Define a function to predict outcomes based on user input

Step 9: Prompt the user to input values for Outlook, Temperature, Humidity, and Windy

Step 10: Transform user input to numerical form using the trained LabelEncoders

Step 11: Predict the outcome based on user input using the trained classifier

Step 12: Print the predicted outcome

Step 13: Call the predict_outcome function

Step 14: Stop

Program:
import pandas as pd
import numpy as np
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report

# Load dataset from CSV file


df = pd.read_csv('naive.csv')

# Separate data into features (X) and labels


SAMPLE DATASET:

Output:
Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): Sunny
Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): Cool
Enter Humidity (e.g., 'High', 'Normal'): High
Enter Windy (True/False): False
Predicted Outcome: No
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Custom encoding function for boolean values


def encode_boolean(value):
return 1 if str(value).lower() == 'true' else 0

# Use LabelEncoder for all features, including the boolean one


label_encoders = [LabelEncoder() for _ in range(X.shape[1])]
X_encoded = [le.fit_transform(X[:, i]) if i != X.shape[1] - 1 else np.array([encode_boolean(val) for val
in X[:, i]]) for i, le in enumerate(label_encoders)]
X_encoded = np.array(list(zip(*X_encoded)))

# Train a Naive Bayes classifier


clf = CategoricalNB()
clf.fit(X_encoded, y)

# Function to predict based on user input


def predict_outcome():
outlook = input("Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): ").capitalize()
temperature = input("Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): ").capitalize()
humidity = input("Enter Humidity (e.g., 'High', 'Normal'): ").capitalize()
windy = input("Enter Windy (True/False): ").capitalize()

# Transform user input to numerical form using the trained LabelEncoders


user_input_encoded = [le.transform([val])[0] if i != X.shape[1] - 1 else encode_boolean(val) for i,
(le, val) in enumerate(zip(label_encoders, [outlook, temperature, humidity, windy]))]
user_input_encoded = np.array(user_input_encoded).reshape(1, -1)

# Predict the outcome


prediction = clf.predict(user_input_encoded)[0]
print(f'Predicted Outcome: {prediction}')

# Call the predict_outcome function


predict_outcome()

Result:
Thus the Naive Bayes algorithm with library functions has been executed and implemented
successfully.
Experiment no:3-b NAIVE BAYES ALGORITHM WITHOUT
Date: USING LIBRARY FUNCION

Aim:
The Objective of this experiment is to implement Naïve Bayes algorithm without using library
functions.

Algorithm:
1. Start

2. Import pandas and numpy.

3. Read the dataset from the CSV file 'naive2.csv' using pandas.

4. Calculate the count of each class in the target variable 'Play Golf'.

5. Calculate the conditional probabilities for each attribute given the target variable.

6. Define a function `predict(X)` to predict the target variable based on the input attributes.

7. Prompt the user to input values for Outlook, Temperature, Humidity, and Windy.

8. Predict the outcome based on the user input using the `predict` function.

9. Define a function `accuracy()` to calculate the accuracy of the model.

10. Print the predicted outcome and accuracy.

11. Stop.

Program:
import pandas as pd
import numpy as np

df = pd.read_csv('naive2.csv')
#df = df.drop(df.columns[df.columns.str.contains('Unnamed')], axis=1)
op = dict(df['Play Golf'].value_counts())
rows = df.shape[0]
prob = {}
for column in df.columns[:-1]:
column_d = {}
for class_ in df[column].unique():
t_d = {}
for output in df['Play Golf'].unique():
occurrences = ((df[column] == class_) & (df['Play Golf'] == output)).sum()
t_d[output] = occurrences/op[output]
prob[str(class_)] = t_d
SAMPLE DATASET:

Output:
Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): Sunny
Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): Mild
Enter Humidity (e.g., 'High', 'Normal'): High
Enter Windy (True/False): True
No
5
13
None
dataset:
Outlook Temperature Humidity Windy Play Golf
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes
5 Rainy Cool Normal True No
6 Overcast Cool Normal True Yes
7 Sunny Mild High False No
8 Sunny Cool Normal False Yes
9 Rainy Mild Normal False Yes
10 Sunny Mild Normal True Yes
11 Overcast Mild High True Yes
12 Overcast Hot Normal False Yes
13 Rainy Mild High True No
prob_df = pd.DataFrame(prob)
def predict(X):
p_yes = op['Yes']/rows
p_no = op['No']/rows
for attr in X:
p_yes *= prob_df[attr]['Yes']
p_no *= prob_df[attr]['No']
P_yes = round(p_yes/(p_yes+p_no),2)
P_no = round(p_no/(p_yes+p_no),2)

return "Yes" if P_yes >= P_no else "No"

outlook = input("Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): ").capitalize()


temperature = input("Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): ").capitalize()
humidity = input("Enter Humidity (e.g., 'High', 'Normal'): ").capitalize()
windy = input("Enter Windy (True/False): ").capitalize()
print(predict([outlook,temperature,humidity,windy]))

def accuracy():
count = 0
for index,row in df.iterrows():
r = list(map(str,row))
if predict(r[:-1]) == r[-1]:
count += 1
else:
print(index)
print(count)
print(accuracy())

print("dataset: ")
print(df)

Result:
Thus the Naive Bayes algorithm without library functions has been executed and implemented
successfully.
Experiment no:4-a IMPLEMENTATION OF DECISION TREE
Date: USING ID3 ALGORITHM

Aim:
The Objective of this experiment is to implement decision tree using ID3 algorithm with inbuilt
libraries in python.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, DecisionTreeClassifier, train_test_split, accuracy_score,


precision_score, recall_score, f1_score, and confusion_matrix

Step 3: Load the dataset from '/content/optimal_feature.csv' using pandas

Step 4: Separate features (X) and target variable (y)

Step 5: Split the dataset into train and test sets

Step 6: Initialize a DecisionTreeClassifier with criterion='entropy' for ID3

Step 7: Train the model using the training data

Step 8: Make predictions on the test set

Step 9: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 10: Print the evaluation metrics

Step 11: Stop

Program:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Separate features and target variable


X = df.drop('status', axis=1)
y = df['status']

# Split dataset into train and test sets


SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize DecisionTreeClassifier with criterion='entropy' for ID3


model = DecisionTreeClassifier(criterion='entropy')

# Train the model


model.fit(X_train, y_train)

# Predict on the test set


y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)

# Calculate precision
precision = precision_score(y_test, y_pred, average='weighted')

# Calculate recall
recall = recall_score(y_test, y_pred, average='weighted')

# Calculate F1-score
f1 = f1_score(y_test, y_pred, average='weighted')

# Calculate confusion matrix


conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

Result:
Thus the decision tree using ID3 algorithm with inbuilt libraries in python has been implemented and
executed successfully.
Experiment no:4-b IMPLEMENTATION OF DECISION TREE
Date: USING C4.5 ALGORITHM

Aim:
The Objective of this experiment is to implement decision tree using C4.5 algorithm with inbuilt
libraries in python.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, train_test_split, DecisionTreeClassifier, accuracy_score,


precision_score, recall_score, f1_score, confusion_matrix, and numpy

Step 3: Define a function to calculate entropy

Step 4: Define a function to calculate information gain

Step 5: Define a function to find the best split based on information gain

Step 6: Load the dataset from '/content/optimal_feature.csv' using pandas

Step 7: Separate features (X) and target variable (y)

Step 8: Handle missing values by replacing NaN with a placeholder value ('missing')

Step 9: Convert categorical features to numeric using one-hot encoding

Step 10: Split the dataset into train and test sets

Step 11: Initialize a DecisionTreeClassifier with criterion='entropy' for the C4.5-like algorithm

Step 12: Train the model using the training data

Step 13: Make predictions on the test set

Step 14: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 15: Print the evaluation metrics

Step 16: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
import numpy as np

# Function to calculate entropy


def entropy(labels):
unique_labels, label_counts = np.unique(labels, return_counts=True)
probabilities = label_counts / len(labels)
entropy = -np.sum(probabilities * np.log2(probabilities))
return entropy

# Function to calculate information gain


def information_gain(data, feature_index, labels):
# Calculate entropy before splitting
entropy_before_split = entropy(labels)

# Calculate weighted sum of entropies after splitting


unique_values = np.unique(data[:, feature_index])
entropy_after_split = 0
for value in unique_values:
subset_indices = np.where(data[:, feature_index] == value)
subset_labels = labels[subset_indices]
subset_entropy = entropy(subset_labels)
weight = len(subset_labels) / len(labels)
entropy_after_split += weight * subset_entropy

# Calculate information gain


information_gain = entropy_before_split - entropy_after_split
return information_gain

# Function to find the best split based on information gain


def find_best_split(data, labels):
num_features = data.shape[1]
best_gain = -1
best_feature = None
for feature_index in range(num_features):
gain = information_gain(data, feature_index, labels)
if gain > best_gain:
best_gain = gain
best_feature = feature_index
return best_feature

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Separate features and target variable


X = df.drop(columns=['status']) # Drop the 'status' column
y = df['status']

# Handle missing values (replace NaN with a placeholder value)


X.fillna('missing', inplace=True)

# Convert categorical features to numeric using one-hot encoding


SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X = pd.get_dummies(X)

# Split dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize DecisionTreeClassifier with criterion='entropy' for C4.5-like algorithm


model = DecisionTreeClassifier(criterion='entropy')

# Train the model


model.fit(X_train, y_train)

# Predict on the test set


y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

Result:
Thus the decision tree using C4.5 algorithm with inbuilt libraries in python has been implemented
and executed successfully.
Experiment no:4-c IMPLEMENTATION OF DECISION TREE
Date: USING CART ALGORITHM

Aim:
The Objective of this experiment is to implement decision tree using CART algorithm with inbuilt
libraries in python.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, train_test_split, DecisionTreeClassifier, accuracy_score,


precision_score, recall_score, f1_score, and confusion_matrix

Step 3: Load the dataset from '/content/optimal_feature.csv' using pandas

Step 4: Separate features (X) and target variable (y)

Step 5: Split the dataset into train and test sets

Step 6: Initialize a DecisionTreeClassifier with criterion='gini' for the CART algorithm

Step 7: Train the model using the training data

Step 8: Make predictions on the test set

Step 9: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 10: Print the evaluation metrics

Step 11: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Separate features and target variable


X = df.drop('status', axis=1)
y = df['status']

# Split dataset into train and test sets


SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize DecisionTreeClassifier with criterion='gini' for CART algorithm


model = DecisionTreeClassifier(criterion='gini')

# Train the model


model.fit(X_train, y_train)

# Predict on the test set


y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

Result:
Thus the decision tree using cart algorithm with inbuilt libraries in python has been implemented
and executed successfully.
Experiment no:5 IMPLEMENTATION OF K-MEANS
Date: CLUSTERING

Aim:
The Objective of this experiment is to implement K means clustering algorithm with
inbuilt libraries in python.

Algorithm:
Here's a concise breakdown of the provided code:

Step 1: Start

Step 2: Import necessary libraries: numpy, matplotlib.pyplot, datasets from sklearn, KMeans from
sklearn.cluster, silhouette_score, davies_bouldin_score, and calinski_harabasz_score from
sklearn.metrics

Step 3: Load the Iris dataset using datasets.load_iris()

Step 4: Separate features (X) and target labels (y)

Step 5: Initialize KMeans clustering algorithm with 3 clusters

Step 6: Fit KMeans to the data

Step 7: Get the cluster centers and labels

Step 8: Plot the clusters and centroids

Step 9: Calculate and print the silhouette score

Step 10: Calculate and print the Davies–Bouldin index

Step 11: Calculate and print the Calinski-Harabasz score

Step 12: Stop

Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score, calinski_harabasz_score

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data # Features
y = iris.target # Target labels

# Initialize KMeans with 3 clusters (since there are 3 types of flowers)


kmeans = KMeans(n_clusters=3)

# Fit KMeans to the data


kmeans.fit(X)

# Get the cluster centers and labels


cluster_centers = kmeans.cluster_centers_
cluster_labels = kmeans.labels_

# Plot the clusters


plt.figure(figsize=(12, 8))

# Define marker shapes and colors for each cluster


markers = ['o', 's', '^']
colors = ['blue', 'green', 'red']

# Plot the data points with colors corresponding to the clusters


for i in range(len(X)):
plt.scatter(X[i, 0], X[i, 1], color=colors[cluster_labels[i]], marker=markers[cluster_labels[i]])

# Plot the centroids of the clusters


plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], color='black', marker='x', s=100,
label='Centroids')

plt.xlabel('Sepal Length (cm)')


plt.ylabel('Sepal Width (cm)')
plt.title('K-means Clustering of Iris Dataset')

# Add legend for cluster labels


for i, label in enumerate(iris.target_names):
plt.scatter([], [], color=colors[i], marker=markers[i], label=label)

plt.legend()
plt.show()

# Calculate silhouette score


silhouette_avg = silhouette_score(X, cluster_labels)
print("Silhouette Score:", silhouette_avg)

# Calculate Davies–Bouldin index


db_index = davies_bouldin_score(X, cluster_labels)
print("Davies–Bouldin Index:", db_index)

# Calculate Calinski-Harabasz score


ch_score = calinski_harabasz_score(X, cluster_labels)
print("Calinski-Harabasz Score:", ch_score)
Output:

Silhouette Score: 0.5528190123564095


Davies–Bouldin Index: 0.6619715465007465
Calinski-Harabasz Score: 561.62775662962
Result:
Thus the k means clustering algorithm using inbuilt library functions has been implemented and
executed successfully.
Experiment no:6-a SUPPORT VECTOR MACHINE
Date: SUPPORT VECTOR CLASSIFICATION

Aim:
The Objective of this experiment is to implement the Support Vector Classification using inbuilt
libraries in python.
Algorithm:
Step 1: Import necessary libraries including pandas, train_test_split, SVC, accuracy_score,
precision_score, recall_score, f1_score, confusion_matrix from sklearn, and matplotlib.pyplot.
Step 2: Load the dataset from the specified CSV file.
Step 3: Encode the categorical variable 'status' using LabelEncoder.
Step 4: Drop the original 'status' column from the DataFrame.
Step 5: Separate features (X) and the target variable (y).
Step 6: Split the dataset into training and testing sets using train_test_split.
Step 7: Initialize an SVC classifier with a linear kernel and regularization parameter C=1.0.
Step 8: Train the SVC classifier using the training data.
Step 9: Predict the target variable for the test dataset.
Step 10: Evaluate the model using accuracy, precision, recall, F1-score, and confusion matrix.
Step 11: Plot the confusion matrix.
Step 12: Train an SVM model with a linear kernel and regularization parameter C=1.0 (repeated for
demonstration).
Step 13: Make predictions on the testing set (repeated for demonstration).
Step 14: Print the confusion matrix (repeated for demonstration).
Step 15: Calculate accuracy, precision, recall, and F1-score (repeated for demonstration).
Step 16: Define a function to plot the decision boundary for an SVM model.
Step 17: Plot the decision boundary for the first two features of the dataset.
Step 18: Stop.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import LabelEncoder

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Encoding categorical variable 'status'


label_encoder = LabelEncoder()
df['status_encoded'] = label_encoder.fit_transform(df['status'])
# Drop the original 'status' column
df.drop('status', axis=1, inplace=True)

# Separate features and target variable


X = df.drop('status_encoded', axis=1)
y = df['status_encoded']

# Split dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features if needed

# Create SVC classifier object


svm_classifier = SVC(kernel='linear', C=1.0) # Adjust parameters as needed

# Train SVC classifier


svm_classifier.fit(X_train, y_train)

# Predict the response for test dataset


y_pred = svm_classifier.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

# Plot confusion matrix


plt.figure(figsize=(8, 6))
plt.imshow(conf_matrix, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.colorbar()
classes = ['Legitimate', 'Phishing']
plt.xticks([0, 1], classes)
plt.yticks([0, 1], classes)
plt.ylabel('Actual')
plt.xlabel('Predicted')
for i in range(len(conf_matrix)):
for j in range(len(conf_matrix[i])):
plt.text(j, i, str(conf_matrix[i][j]), horizontalalignment='center', color='black')
plt.show()

# Step 6: Train an SVM model with a linear kernel and regularization parameter C=1.0
svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(X_train, y_train)

# Step 7: Make predictions on the testing set


y_pred = svm_model.predict(X_test)

# Step 8: Print the confusion matrix


conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Step 9: Calculate accuracy, precision, recall, and F1-score


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print accuracy, precision, recall, and F1-score


print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)

# Step 10: Plot the SVM decision boundary


def plot_decision_boundary(X, y, model):
# Get support vectors and their corresponding labels
support_vectors = model.support_vectors_
y_reset_index = y.reset_index(drop=True) # Reset index to ensure alignment
support_vectors_labels = y_reset_index[model.support_]

# Plot data points for each class with different colors


plt.scatter(X[y == 0, 0], X[y == 0, 1], c='blue', label='Non-malicious', cmap=plt.cm.coolwarm, s=20,
edgecolors='k')
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='red', label='Malicious', cmap=plt.cm.coolwarm, s=20,
edgecolors='k')

# Plot support vectors


plt.scatter(support_vectors[:, 0], support_vectors[:, 1], s=100, linewidth=1, facecolors='none',
edgecolors='k')

# Create mesh grid to plot decision boundary


ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50),
np.linspace(ylim[0], ylim[1], 50))
Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])

# Plot decision boundary and margins


Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]

Confusion Matrix:
[[2 0]
[0 1]]
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-score: 1.0
<ipython-input-16-b1a6a5941cd6>:98: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='blue', label='Non-malicious',
cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:99: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='red', label='Malicious',
cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:117: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[model.decision_function(X) > 0][:, 0],
X[model.decision_function(X) > 0][:, 1], c='green', label='Above Positive
Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:118: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[model.decision_function(X) < 0][:, 0],
X[model.decision_function(X) < 0][:, 1], c='orange', label='Below Negative
Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')

# Plot data points above positive hyperplane and below negative hyperplane
plt.scatter(X[model.decision_function(X) > 0][:, 0], X[model.decision_function(X) > 0][:, 1],
c='green', label='Above Positive Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
plt.scatter(X[model.decision_function(X) < 0][:, 0], X[model.decision_function(X) < 0][:, 1],
c='orange', label='Below Negative Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')

plt.title('SVM Decision Boundary')


plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
# For illustration purpose, we'll plot decision boundary for the first two features
X_train_2d = X_train.iloc[:, :2].values # Convert DataFrame to NumPy array
svm_model_2d = SVC(kernel='linear', C=1.0)
svm_model_2d.fit(X_train_2d, y_train)

# Plot decision boundary


plot_decision_boundary(X_train_2d, y_train, svm_model_2d)

Result:
Thus the Support Vector Classification using inbuilt libraries in python has been implemented and
executed successfully.
Experiment no:6-b SUPPORT VECTOR MACHINE
Date: SUPPORT VECTOR REGRESSION

Aim:
The Objective of this experiment is to implement the Support Vector Regression using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Load necessary libraries
Step 3: Load dataset
Step 4: Separate features and target variable
Step 5: Split dataset into train and test sets
Step 6: Create SVR model object
Step 7: Train SVR model
Step 8: Predict response for test dataset
Step 9: Evaluate model (calculate MSE and R^2)
Step 10: Plot actual vs. predicted values
Step 11: Stop
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('dress_sales_prediction.csv')

# Separate features and target variable


X = df.drop('sales', axis=1)
y = df['sales']

# Split dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create SVR model object


svr_model = SVR(kernel='linear', C=1.0) # Adjust parameters as needed

# Train SVR model


svr_model.fit(X_train, y_train)

# Predict the response for test dataset


y_pred = svr_model.predict(X_test)

# Evaluate the model


SAMPLE DATASET:

Output:
Mean Squared Error: 8.043392151715722
R^2 Score: 0.9574175861521746
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)


print("R^2 Score:", r2)

# Plot actual vs. predicted values


plt.scatter(y_test, y_pred)
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.title("Actual vs. Predicted values (SVR)")
plt.show()

Result:
Thus the Support Vector Regression using inbuilt libraries in python has been implemented and
executed successfully
Experiment no:7-a ENSEMBLE LEARNING
Date: STACKING

Aim:
The Objective of this experiment is to implement stacking using inbuilt libraries in python.
Algorithm:
Step 1: Start
Step 2: Load necessary libraries
Step 3: Load dataset
Step 4: Separate features and target variable
Step 5: Split dataset into train and test sets
Step 6: Create SVR model object
Step 7: Train SVR model
Step 8: Predict response for test dataset
Step 9: Evaluate model (calculate MSE and R^2)
Step 10: Plot actual vs. predicted values
Step 11: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder

# Load your dataset


data = pd.read_csv('optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column


data.drop('status', axis=1, inplace=True)

# Separate features and target variable


X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models


base_models = [
SAMPLE DATASET:

Output:
Stacking accuracy: 0.6666666666666666
Precision: 0.0
Recall: 0.0
F1 Score: 0.0

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning:
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to
control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Stacking accuracy: 0.3333333333333333
Precision: 0.3333333333333333
Recall: 1.0
F1 Score: 0.5

Stacking accuracy: 1.0


Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Stacking accuracy: 0.3333333333333333


Precision: 0.3333333333333333
Recall: 1.0
F1 Score: 0.5

Stacking accuracy: 1.0


Precision: 1.0
Recall: 1.0
F1 Score: 1.0
('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
('gb', GradientBoostingClassifier(n_estimators=100, random_state=42))
]

# Define meta-model
meta_model = LogisticRegression()

# Implement cross-validated stacking


kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, val_idx in kfold.split(X_train, y_train):


train_fold_X, val_fold_X = X_train.iloc[train_idx], X_train.iloc[val_idx]
train_fold_y, val_fold_y = y_train.iloc[train_idx], y_train.iloc[val_idx]

# Create empty arrays to store meta-features


meta_train = pd.DataFrame()
meta_test = pd.DataFrame()

# Fit base models on the training fold and predict on the validation fold
for name, model in base_models:
cloned_model = clone(model)
cloned_model.fit(train_fold_X, train_fold_y)
meta_train[name] = cloned_model.predict_proba(val_fold_X)[:, 1]

# Fit the meta-model on the validation fold predictions


meta_model.fit(meta_train, val_fold_y)

# Fit base models on the entire training set and predict on the test set
for name, model in base_models:
cloned_model = clone(model)
cloned_model.fit(X_train, y_train)
meta_test[name] = cloned_model.predict_proba(X_test)[:, 1]

# Make predictions on the test set using the meta-model


meta_pred = meta_model.predict(meta_test)
print("Stacking accuracy:", accuracy_score(y_test, meta_pred))
print("Precision:", precision_score(y_test, meta_pred))
print("Recall:", recall_score(y_test, meta_pred))
print("F1 Score:", f1_score(y_test, meta_pred))
print()

Result:
Thus stalking has been implemented and executed successfully
Experiment no:7-b ENSEMBLE LEARNING
Date: BAGGING-MAX VOTE

Aim:
The Objective of this experiment is to implement Max Vote bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Take majority vote for each instance
Step 14: Calculate accuracy, precision, recall, and F1 score
Step 15: Print bagging with majority voting metrics: accuracy, precision, recall, and F1 score
Step 16: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder

# Load your dataset


data = pd.read_csv('/content/optimal_feature.csv')

# Encode categorical variable 'status'


label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column


data.drop('status', axis=1, inplace=True)
# Separate features and target variable
X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base model


base_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create list to hold base models


models = []
# Number of base models to create
num_models = 5
# Train and clone base models
for i in range(num_models):
model = clone(base_model)
model.fit(X_train, y_train)
models.append(model)
# Make predictions on the test set using each base model
predictions = []
for model in models:
predictions.append(model.predict(X_test))
# Take majority vote
final_predictions = []
for i in range(len(X_test)):
votes = [prediction[i] for prediction in predictions]
final_predictions.append(max(set(votes), key=votes.count))
# Calculate accuracy
accuracy = accuracy_score(y_test, final_predictions)
precision = precision_score(y_test, final_predictions)
recall = recall_score(y_test, final_predictions)
f1 = f1_score(y_test, final_predictions)
print("Bagging with Majority Voting Metrics:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print()

Result:
Thus Bagging- max voting has been implemented and executed successfully
SAMPLE DATASET:
Output:
Bagging with Majority Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Experiment no:7-c ENSEMBLE LEARNING
Date: BAGGING-AVERAGE VOTTING

Aim:
The Objective of this experiment is to implement Average Voting bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Take probabilities of predictions for each base model
Step 14: Calculate average probabilities for each class across all base models
Step 15: Convert average probabilities to predicted labels by selecting the class with the highest
probability
Step 16: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 17: Print bagging with average voting metrics: accuracy, precision, recall, and F1 score
Step 18: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
import numpy as np

# Load your dataset


data = pd.read_csv('/content/optimal_feature.csv')

# Encode categorical variable 'status'


label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column


data.drop('status', axis=1, inplace=True)

# Separate features and target variable


X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base model


base_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create list to hold base models


models = []

# Number of base models to create


num_models = 5

# Train and clone base models


for i in range(num_models):
model = clone(base_model)
model.fit(X_train, y_train)
models.append(model)

# Make predictions on the test set using each base model


predictions = []
for model in models:
predictions.append(model.predict_proba(X_test))

# Take average of probabilities


avg_probabilities = np.mean(predictions, axis=0)

# Convert probabilities to labels


final_predictions = np.argmax(avg_probabilities, axis=1)

# Calculate evaluation metrics


accuracy = accuracy_score(y_test, final_predictions)
precision = precision_score(y_test, final_predictions)
recall = recall_score(y_test, final_predictions)
f1 = f1_score(y_test, final_predictions)

# Print performance metrics


print("Bagging with Average Voting Metrics:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
Result:
Thus Bagging- avg voting has been implemented and executed successfully
SAMPLE DATASET:
Output:
Bagging with Average Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Experiment no:7-d ENSEMBLE LEARNING
Date: BAGGING-WEIGHTED AVERAGE

Aim:
The Objective of this experiment is to implement Average Voting bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Assign weights to each base model's predictions (example weights are provided)
Step 14: Calculate the weighted average of predictions from all base models
Step 15: Convert weighted average probabilities to predicted labels by selecting the class with the
highest probability
Step 16: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 17: Print bagging with weighted average voting metrics: accuracy, precision, recall, and F1 score
Step 18: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
import numpy as np

# Load your dataset


data = pd.read_csv('/content/optimal_feature.csv')

# Encode categorical variable 'status'


label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
# Drop the original 'status' column
data.drop('status', axis=1, inplace=True)

# Separate features and target variable


X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base model


base_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create list to hold base models


models = []

# Number of base models to create


num_models = 5

# Train and clone base models


for i in range(num_models):
model = clone(base_model)
model.fit(X_train, y_train)
models.append(model)

# Make predictions on the test set using each base model


predictions = []
for model in models:
predictions.append(model.predict_proba(X_test))

# Assign weights to each base model's predictions


weights = [0.2, 0.3, 0.2, 0.1, 0.2] # Example weights (you can adjust these)
weighted_predictions = np.zeros_like(predictions[0])
for i, pred in enumerate(predictions):
weighted_predictions += weights[i] * pred

# Convert probabilities to labels


final_predictions = np.argmax(weighted_predictions, axis=1)

# Calculate evaluation metrics


accuracy = accuracy_score(y_test, final_predictions)
precision = precision_score(y_test, final_predictions)
recall = recall_score(y_test, final_predictions)
f1 = f1_score(y_test, final_predictions)

# Print performance metrics


print("Bagging with Weighted Average Voting Metrics:")
print("Accuracy:", accuracy)

SAMPLE DATASET:
Output:
Bagging with Weighted Average Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Result:
Thus Bagging- weighted average has been implemented and executed successfully
Experiment no:7-e ENSEMBLE LEARNING
Date: BOOSTING – ADAPTIVE BOOSTING

Aim:
The Objective of this experiment is to implement Adaptive boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into training and testing sets
Step 8: Initialize AdaBoost Classifier with Decision Tree as base estimator and 50 estimators
Step 9: Train the AdaBoost Classifier on the training set
Step 10: Make predictions on the testing set
Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 12: Print accuracy, precision, recall, and F1 score
Step 13: Print confusion matrix
Step 14: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Load your dataset


data = pd.read_csv('/content/optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column


data.drop('status', axis=1, inplace=True)

# Separate features and target variable


X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[2 0]
[0 1]]
# Initialize AdaBoost Classifier with Decision Tree as base estimator
adaboost_classifier = AdaBoostClassifier(n_estimators=50, random_state=42)

# Train the AdaBoost Classifier


adaboost_classifier.fit(X_train, y_train)

# Make predictions on the testing set


y_pred = adaboost_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Result:
Thus adaptive boosting has been implemented and executed successfully.
Experiment no:7-f ENSEMBLE LEARNING
Date: BOOSTING – GRADIENT BOOSTING

Aim:
The Objective of this experiment is to implement Gradient boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into training and testing sets
Step 8: Initialize Gradient Boosting Classifier with 100 estimators
Step 9: Train the Gradient Boosting Classifier on the training set
Step 10: Make predictions on the testing set
Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 12: Print accuracy, precision, recall, and F1 score
Step 13: Print confusion matrix
Step 14: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Load your dataset


data = pd.read_csv('/content/optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column


data.drop('status', axis=1, inplace=True)

# Separate features and target variable


X = data.drop('status_encoded', axis=1)
y = data['status_encoded']
SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[2 0]
[0 1]]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Gradient Boosting Classifier


gradient_boosting_classifier = GradientBoostingClassifier(n_estimators=100, random_state=42)

# Train the Gradient Boosting Classifier


gradient_boosting_classifier.fit(X_train, y_train)

# Make predictions on the testing set


y_pred = gradient_boosting_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Result:
Thus gradient boosting has been implemented and executed successfully.
Experiment no:7-g ENSEMBLE LEARNING
Date: BOOSTING – CATEGORIAL BOOSTING

Aim:
The Objective of this experiment is to implement Categorical boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start

Step 2: Import necessary libraries

Step 3: Load dataset

Step 4: Encode categorical variable 'status'

Step 5: Drop the original 'status' column

Step 6: Separate features and target variable

Step 7: Split the dataset into training and testing sets

Step 8: Initialize CatBoostClassifier

Step 9: Train the CatBoostClassifier on the training set

Step 10: Make predictions on the testing set

Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)

Step 12: Print accuracy, precision, recall, and F1 score

Step 13: Print confusion matrix

Step 14: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
import catboost as ctb
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Load your dataset


data = pd.read_csv('/content/optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
SAMPLE DATASET:

Output:
Learning rate set to 0.001502
0: learn: 0.6922122 total: 1.08ms remaining: 1.08s
1: learn: 0.6908270 total: 6.1ms remaining: 3.04s
2: learn: 0.6898993 total: 7.48ms remaining: 2.48s
3: learn: 0.6888169 total: 9.22ms remaining: 2.29s
4: learn: 0.6869733 total: 12.2ms remaining: 2.43s
5: learn: 0.6856654 total: 14.6ms remaining: 2.42s
6: learn: 0.6844488 total: 16.2ms remaining: 2.29s
7: learn: 0.6834330 total: 16.9ms remaining: 2.09s
8: learn: 0.6817790 total: 19.4ms remaining: 2.14s
9: learn: 0.6802639 total: 19.9ms remaining: 1.97s
10: learn: 0.6789032 total: 23.8ms remaining: 2.14s
11: learn: 0.6772619 total: 24.6ms remaining: 2.02s
12: learn: 0.6762064 total: 25.8ms remaining: 1.96s
13: learn: 0.6751169 total: 27.3ms remaining: 1.92s
14: learn: 0.6741816 total: 28.9ms remaining: 1.9s
15: learn: 0.6732487 total: 31.2ms remaining: 1.92s
16: learn: 0.6719068 total: 32.5ms remaining: 1.88s
17: learn: 0.6702825 total: 33.6ms remaining: 1.83s
18: learn: 0.6689514 total: 34.6ms remaining: 1.79s
19: learn: 0.6677421 total: 42.3ms remaining: 2.07s
20: learn: 0.6665820 total: 44ms remaining: 2.05s
21: learn: 0.6657759 total: 45.4ms remaining: 2.02s
And so on…………………
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[1 0]
[0 3]]
# Drop the original 'status' column
data.drop('status', axis=1, inplace=True)

# Separate features and target variable


x = data.drop('status_encoded', axis=1)
y = data['status_encoded']

X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.25)


catb = ctb.CatBoostClassifier()
catb_model = catb.fit(X_train,Y_train)

# Make predictions on the testing set


Y_pred = catb_model.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)


precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(Y_test, Y_pred))

Result:
Thus categorical boosting has been implemented and executed successfully.
Experiment no:8 REINFORCEMNT LEARNING
Date: Q LEARNING ALGORITHM(Maze Game)

Aim:
The Objective of this experiment is to implement Q-learning algorithm of reinforcement
learning in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Define the `QLearningMaze` class with methods to initialize Q-learning parameters, choose
actions, update Q-values, train, take actions, and solve the maze
Step 4: Define the `MazeGame` class with methods to initialize the maze game, draw the maze, draw
the player, handle player movements, and check for game completion
Step 5: Create a custom maze function to get user input for maze dimensions and values
Step 6: Get custom maze input from the user
Step 7: Specify maze parameters such as width, height, and cell size
Step 8: Specify starting and ending positions in the maze
Step 9: Mark starting and ending positions in the maze
Step 10: Create the maze game window and run the game loop
Step 11: Stop
Program:
import numpy as np
import tkinter as tk
from tkinter import messagebox, simpledialog

class QLearningMaze:
def __init__(self, maze, start, end, learning_rate=0.1, discount_factor=0.9, exploration_rate=0.1,
epochs=1000):
self.maze = maze
self.start = start
self.end = end
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.exploration_rate = exploration_rate
self.epochs = epochs
self.q_values = np.zeros((len(maze), len(maze[0]), 4)) # Q-values for each action (up, down, left,
right)

def choose_action(self, state):


if np.random.uniform(0, 1) < self.exploration_rate:
# Explore: choose a random action
return np.random.choice(4)
else:
# Exploit: choose the action with the highest Q-value
return np.argmax(self.q_values[state[0], state[1]])
def update_q_values(self, state, action, reward, next_state):
# Bellman equation: Q(s, a) = Q(s, a) + α * (reward + γ * max(Q(s', a')) - Q(s, a))
self.q_values[state[0], state[1], action] += self.learning_rate * \
(reward + self.discount_factor * np.max(self.q_values[next_state[0], next_state[1]]) -
self.q_values[state[0], state[1], action])

def train(self):
for _ in range(self.epochs):
current_state = self.start
while current_state != self.end:
action = self.choose_action(current_state)
next_state, reward = self.take_action(current_state, action)
self.update_q_values(current_state, action, reward, next_state)
current_state = next_state

def take_action(self, state, action):


# Move in the chosen direction (action)
if action == 0: # Up
next_state = (max(0, state[0] - 1), state[1])
elif action == 1: # Down
next_state = (min(len(self.maze) - 1, state[0] + 1), state[1])
elif action == 2: # Left
next_state = (state[0], max(0, state[1] - 1))
elif action == 3: # Right
next_state = (state[0], min(len(self.maze[0]) - 1, state[1] + 1))

# Determine reward based on the next state


if next_state == self.end:
reward = 100
elif self.maze[next_state[0]][next_state[1]] == 1:
reward = -100
next_state = state # Stay in the same state if hitting a wall
else:
reward = -1

# If the agent is at the destination but can't move towards it, return to the start
if next_state == state and state == self.end:
next_state = self.start
reward = -100 # Penalize for being stuck at the destination

return next_state, reward

def solve(self):
current_state = self.start
path = [current_state]
while current_state != self.end:
action = np.argmax(self.q_values[current_state[0], current_state[1]])
next_state, _ = self.take_action(current_state, action)
path.append(next_state)
current_state = next_state
return path
class MazeGame:
def __init__(self, master, maze):
self.master = master
self.master.title("Maze Game")
self.canvas = tk.Canvas(master, width=maze_width, height=maze_height)
self.canvas.pack()
self.maze = maze
self.player_pos = (0, 0) # Starting position of the player
self.draw_maze()
self.player = None # To store player's shape ID
self.game_over = False

self.master.bind('<Up>', self.move_up)
self.master.bind('<Down>', self.move_down)
self.master.bind('<Left>', self.move_left)
self.master.bind('<Right>', self.move_right)

def draw_maze(self):
for y in range(len(self.maze)):
for x in range(len(self.maze[y])):
if self.maze[y][x] == 1:
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="black")
elif self.maze[y][x] == 0:
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="white")
elif self.maze[y][x] == 2: # Starting position
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="green")
elif self.maze[y][x] == 3: # Ending position
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="blue")

def draw_player(self):
if self.player:
self.canvas.delete(self.player)
player_x = self.player_pos[1] * cell_size
player_y = self.player_pos[0] * cell_size
self.player = self.canvas.create_oval(player_x, player_y, player_x + cell_size, player_y + cell_size,
fill="red")

def move_up(self, event):


if not self.game_over:
new_y = max(0, self.player_pos[0] - 1)
if self.maze[new_y][self.player_pos[1]] != 1:
self.player_pos = (new_y, self.player_pos[1])
self.update_player()
self.check_win()

def move_down(self, event):


if not self.game_over:
new_y = min(len(self.maze) - 1, self.player_pos[0] + 1)
if self.maze[new_y][self.player_pos[1]] != 1:
self.player_pos = (new_y, self.player_pos[1])
self.update_player()
self.check_win()

def move_left(self, event):


if not self.game_over:
new_x = max(0, self.player_pos[1] - 1)
if self.maze[self.player_pos[0]][new_x] != 1:
self.player_pos = (self.player_pos[0], new_x)
self.update_player()
self.check_win()

def move_right(self, event):


if not self.game_over:
new_x = min(len(self.maze[0]) - 1, self.player_pos[1] + 1)
if self.maze[self.player_pos[0]][new_x] != 1:
self.player_pos = (self.player_pos[0], new_x)
self.update_player()
self.check_win()

def update_player(self):
self.draw_player()

def check_win(self):
if self.player_pos == ending_position:
self.game_over = True
messagebox.showinfo("Congratulations!", "You won!")
self.master.quit()

def create_custom_maze():
rows = simpledialog.askinteger("Input", "Enter the number of rows:")
cols = simpledialog.askinteger("Input", "Enter the number of columns:")
maze = [[0] * cols for _ in range(rows)]
for i in range(rows):
for j in range(cols):
maze[i][j] = simpledialog.askinteger("Input", f"Enter value for cell ({i}, {j}): 0 (empty) or 1
(blocked)")
return maze

# Get custom maze input from the user


maze = create_custom_maze()

# Define maze parameters


maze_width = 20 * len(maze[0])
maze_height = 20 * len(maze)
Output:
cell_size = 20

# Specify starting position and ending position


starting_position = (0, 0) # Top-left corner
ending_position = (len(maze) - 1, len(maze[0]) - 1) # Bottom-right corner

# Mark starting and ending positions in the maze


maze[starting_position[0]][starting_position[1]] = 2 # Represented by 2
maze[ending_position[0]][ending_position[1]] = 3 # Represented by 3

# Create the maze game


root = tk.Tk()
game = MazeGame(root, maze)
root.mainloop()

Result:
Thus q-learning of reinforcement learning has been implemented and executed successfully.
Experiment no:9 MULTI LAYER PERCEPTRON
Date:

Aim:
The Objective of this experiment is to implement Multilayer perceptron using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Define the `MLP` class with methods for forward pass, backward pass, training, evaluation,
and auxiliary functions for sigmoid activation and its derivative
Step 4: Get user inputs for the number of patterns and input features per pattern
Step 5: Generate XOR dataset based on user inputs
Step 6: Define an MLP object with specified input, hidden, and output sizes
Step 7: Train the MLP using the generated XOR dataset
Step 8: Evaluate the trained MLP using the same dataset
Step 9: Print evaluation metrics such as accuracy, precision, recall, F1 score, and confusion matrix
Step 10: Plot the training loss over epochs and the confusion matrix
Step 11: Stop
Program:
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score,
f1_score
import matplotlib.pyplot as plt

class MLP:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size

# Initialize weights and biases


self.weights_input_hidden = np.random.randn(input_size, hidden_size)
self.bias_input_hidden = np.zeros((1, hidden_size))

self.weights_hidden_output = np.random.randn(hidden_size, output_size)


self.bias_hidden_output = np.zeros((1, output_size))

# Store loss values during training


self.loss_history = []

def forward(self, inputs):


# Forward pass
self.hidden_layer_activation = np.dot(inputs, self.weights_input_hidden) +
self.bias_input_hidden
self.hidden_layer_output = self.sigmoid(self.hidden_layer_activation)
self.output_layer_activation = np.dot(self.hidden_layer_output, self.weights_hidden_output) +
self.bias_hidden_output
self.output = self.sigmoid(self.output_layer_activation)

return self.output

def sigmoid(self, x):


return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):


return x * (1 - x)

def backward(self, inputs, targets, learning_rate):


# Backward pass
# Calculate output layer error
output_error = targets - self.output
output_delta = output_error * self.sigmoid_derivative(self.output)

# Calculate hidden layer error


hidden_error = output_delta.dot(self.weights_hidden_output.T)
hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_layer_output)

# Update weights and biases


self.weights_hidden_output += self.hidden_layer_output.T.dot(output_delta) * learning_rate
self.bias_hidden_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate

self.weights_input_hidden += inputs.T.dot(hidden_delta) * learning_rate


self.bias_input_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

def train(self, inputs, targets, learning_rate, epochs, validation_data=None, patience=100):


best_loss = np.inf
no_improvement_count = 0

for epoch in range(epochs):


# Forward pass
outputs = self.forward(inputs)

# Backward pass
self.backward(inputs, targets, learning_rate)

# Calculate and store loss


loss = np.mean(np.square(targets - outputs))
self.loss_history.append(loss)

if epoch % 1000 == 0:
print(f"Epoch {epoch}, Loss: {loss}")

# Early stopping based on validation loss


if validation_data:
val_inputs, val_targets = validation_data
val_outputs = self.forward(val_inputs)
val_loss = np.mean(np.square(val_targets - val_outputs))

if val_loss < best_loss:


best_loss = val_loss
no_improvement_count = 0
else:
no_improvement_count += 1

if no_improvement_count >= patience:


print(f"Stopping early at epoch {epoch} due to no improvement in validation loss.")
break

def evaluate(self, inputs, targets):


predictions = np.round(self.forward(inputs))
accuracy = accuracy_score(targets, predictions)
precision = precision_score(targets, predictions)
recall = recall_score(targets, predictions)
f1 = f1_score(targets, predictions)
confusion_mat = confusion_matrix(targets, predictions)
return accuracy, precision, recall, f1, confusion_mat

# User Inputs
num_patterns = int(input("Enter the number of patterns in your dataset: "))
input_size = int(input("Enter the number of input features per pattern: "))

# Generate XOR dataset automatically


X_train = np.random.randint(2, size=(num_patterns, input_size))
y_train = np.array([[(x[0] ^ x[1])] for x in X_train]) # Calculate XOR output automatically

# Use the same data for training and testing


X_test, y_test = X_train, y_train

# Define MLP
mlp = MLP(input_size=input_size, hidden_size=4, output_size=1)

# Train MLP
mlp.train(X_train, y_train, learning_rate=0.01, epochs=10000, validation_data=(X_test, y_test),
patience=500)

# Evaluate MLP
accuracy, precision, recall, f1, confusion_mat = mlp.evaluate(X_test, y_test)

# Print evaluation metrics


print("\nEvaluation Metrics:")
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
Output:
Enter the number of patterns in your dataset: 100
Enter the number of input features per pattern: 2
Epoch 0, Loss: 0.36609605648209426
Epoch 1000, Loss: 0.015983581207552385
Epoch 2000, Loss: 0.004408591411027319
Epoch 3000, Loss: 0.002388007046487561
Epoch 4000, Loss: 0.0016026034795046582
Epoch 5000, Loss: 0.0011936939190098374
Epoch 6000, Loss: 0.000945475101628567
Epoch 7000, Loss: 0.0007797834456328132
Epoch 8000, Loss: 0.0006617921746716564
Epoch 9000, Loss: 0.0005737365743443018

Evaluation Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[50 0]
[ 0 50]]

print("\nConfusion Matrix:")
print(confusion_mat)

# Plot the training loss over epochs


plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(mlp.loss_history)
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')

# Plot the confusion matrix


plt.subplot(1, 2, 2)
plt.imshow(confusion_mat, cmap='Blues', interpolation='nearest')
plt.title('Confusion Matrix')
plt.colorbar()
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.xticks([0, 1], ['0', '1'])
plt.yticks([0, 1], ['0', '1'])
for i in range(confusion_mat.shape[0]):
for j in range(confusion_mat.shape[1]):
plt.text(j, i, confusion_mat[i, j], ha='center', va='center', color='red')

plt.tight_layout()
plt.show()

Result:
Thus the multi layer perceptron has been implemented and executed successfully.
FEATURE EXTRACTION:

import re

from urllib.parse import urlparse, parse_qs, unquote

import pandas as pd

def extract_url_length(url):

return len(url)

def has_https(url):

return url.startswith("https")

def has_suspicious_keywords(url):

suspicious_keywords = ['phishing', 'malware', 'fraud', 'login', 'password', 'bank', 'paypal']

for keyword in suspicious_keywords:

if keyword in url:

return 1

return 0

def extract_url_syntax(url):

return len(url)

def extract_host_name(url):

parsed_url = urlparse(url)

return parsed_url.hostname

def extract_path_length(url):

parsed_url = urlparse(url)

return len(parsed_url.path)

def has_favicon(url):

return 0

def has_ssl_certificate(url):

return 1

def extract_page_rank(url):

return 0

def has_double_slash_redirection(url):

return "//" in url

def extract_folder_name(url):
parsed_url = urlparse(url)

return parsed_url.path.split('/')[-1]

def has_special_characters(url):

special_characters = set("!@#$%^&*()+=-[]{}|\<>?,.")

return any(char in special_characters for char in url)

def extract_ccTLD(url):

parsed_url = urlparse(url)

return parsed_url.netloc.split('.')[-1]

def has_subdomains(url):

parsed_url = urlparse(url)

return len(parsed_url.netloc.split('.')) > 2

def extract_features(url):

features = {

'url_length': extract_url_length(url),

'path_length': extract_path_length(url),

'ip_present': 0,

'number_of_digits': sum(c.isdigit() for c in url),

'number_of_parameters': 0,

'number_of_fragments': 0,

'is_url_encoded': 0,

'number_encoded_char': 0,

'host_length': 0,

'number_of_periods': 0,

'has_client': 0,

'has_admin': 0,

'has_server': 0,

'has_login': 0,

'number_of_dashes': 0,

'have_at_sign': 0,

'is_tiny_url': 0,

'number_of_spaces': 0,
'number_of_zeros': 0,

'number_of_uppercase': 0,

'number_of_lowercase': 0,

'number_of_double_slashes': 0,

'HTTPS Usage': has_https(url),

'Presence of Suspicious Keywords': has_suspicious_keywords(url),

'URL Syntax': extract_url_syntax(url),

'Presence of Favicon': has_favicon(url),

'SSL Certificate': has_ssl_certificate(url),

'Page Rank': extract_page_rank(url),

'Double Slash Redirection': has_double_slash_redirection(url),

'Folder Name': extract_folder_name(url),

'Use of Special Characters': has_special_characters(url),

'Country Code Top-Level Domain (ccTLD)': extract_ccTLD(url),

'Presence of Subdomains': has_subdomains(url)

parsed_url = urlparse(url)

if re.match(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', parsed_url.netloc):

features['ip_present'] = 1

if parsed_url.fragment:

features['number_of_fragments'] = len(parsed_url.fragment)

if url != unquote(url):

features['is_url_encoded'] = 1

features['number_encoded_char'] = len(re.findall('%[0-9A-Fa-f]{2}', url))

features['host_length'] = len(parsed_url.netloc)

features['number_of_periods'] = parsed_url.netloc.count('.')

keywords = ['client', 'admin', 'server', 'login']

for keyword in keywords:

features['has_' + keyword] = 1 if keyword in url else 0

features['number_of_dashes'] = url.count('-')
features['have_at_sign'] = 1 if '@' in url else 0

features['is_tiny_url'] = 1 if len(url) < 25 else 0

features['number_of_spaces'] = url.count(' ')

features['number_of_zeros'] = url.count('0')

features['number_of_uppercase'] = sum(c.isupper() for c in url)

features['number_of_lowercase'] = sum(c.islower() for c in url)

features['number_of_double_slashes'] = url.count('//')

return features

def extract_features_for_dataset(dataset):

extracted_features = []

for url in dataset['URL']:

features = extract_features(url)

extracted_features.append(features)

return pd.DataFrame(extracted_features)

data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/datasets/dataset_with_url.csv')

extracted_features_df = extract_features_for_dataset(data)

data_with_features = pd.concat([data, extracted_features_df], axis=1)

data_with_features.to_csv('/content/drive/MyDrive/Colab
Notebooks/datasets/feature_extracted_dataset.csv', index=False)df =
pd.read_csv("/content/drive/MyDrive/ColabNotebooks/datasets/feature_extracted_dataset.csv")pri
nt("extracted features dataset:\n")

print(df.head())
output:
extracted features dataset:
URL status url_length \
0 http://www.crestonwood.com/router.php legitimate
37
1 http://shadetreetechnology.com/V4/validation/a... phishing
77
2 https://support-appleld.com.secureupdate.duila... phishing
126
3 http://rgipt.ac.in legitimate
18
4 http://www.iracing.com/tracks/gateway-motorspo... legitimate
55

path_length ip_present number_of_digits number_of_parameters \


0 11 0 0 0
1 47 0 17 0
2 20 0 19 0
3 0 0 0 0
4 33 0 0 0

number_of_fragments is_url_encoded number_encoded_char ... \


0 0 0 0 ...
1 0 0 0 ...
2 0 0 0 ...
3 0 0 0 ...
4 0 0 0 ...

Presence of Suspicious Keywords URL Syntax Presence of Favicon \


0 0 37 0
1 0 77 0
2 0 126 0
3 0 18 0
4 0 55 0

SSL Certificate Page Rank Double Slash Redirection \


0 1 0 True
1 1 0 True
2 1 0 True
3 1 0 True
4 1 0 True

Folder Name Use of Special Characters \


0 router.php True
1 a111aedc8ae390eabcfa130e041a10a4 True
2 NaN True
3 NaN True
4 NaN True

Country Code Top-Level Domain (ccTLD) Presence of Subdomains


0 com True
1 com False
2 com True
3 in True
4 com True

[5 rows x 35 columns]
OPTIMIZATION USING ANTCOLONY ALGORITHM:

import numpy as np
import pandas as pd

# Load the dataset


data = pd.read_csv('malicious_url.csv')

# Extract URLs and labels


urls = data['url'].values
labels = data['type'].values

# Define parameters
num_ants = 10
num_iterations = 100
pheromone_decay = 0.5
pheromone_increase = 1.0

# Initialize pheromone levels


num_urls = len(urls)
pheromones = np.ones(num_urls)

# Define objective function to evaluate solutions


def objective_function(solution):
# Dummy function, replace with actual scoring logic
return np.sum(solution)

# Ant Colony Optimization


best_solution = None
best_score = float('-inf')

for iteration in range(num_iterations):


for ant in range(num_ants):
# Construct solution probabilistically based on pheromone levels
probabilities = pheromones / np.sum(pheromones) if np.sum(pheromones) != 0 else
np.ones(num_urls) / num_urls
solution = np.random.choice([0, 1], size=num_urls, p=probabilities)

# Evaluate solution
score = objective_function(solution)

# Update best solution


if score > best_score:
best_solution = solution
best_score = score
# Update pheromone levels
pheromones *= pheromone_decay
pheromones[best_solution == 1] += pheromone_increase

# Process the best solution to identify malicious URLs


malicious_urls_indices = np.where(best_solution == 1)[0]
malicious_urls = urls[malicious_urls_indices]

# Save the filtered dataset containing malicious URLs


malicious_data = data.iloc[malicious_urls_indices]
malicious_data.to_csv('malicious_urls_output.csv', index=False)

SAMPLE DATASET:

You might also like