0% found this document useful (0 votes)

40 views131 pages

ML

Uploaded by

kiruthikaa T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views131 pages

ML

Uploaded by

kiruthikaa T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 131

Experiment no:1-a DATASET CREATION

Date: CATEGORICAL DATASET

Aim:
The Objective of the experiment is to create a categorical dataset using numpy, pandas
package in python.

Algorithm:
Step 1: Start

Step 2: Import the `csv` module

Step 3: Define the file path for the CSV file

Step 4: Open the CSV file in write mode and define field names

Step 5: Write the header row to the CSV file

Step 6: Begin a loop to collect user data

Step 7: Prompt the user to enter their name

Step 8: Check if the user wants to exit

Step 9: If not, prompt the user to enter their age, height, and weight

Step 10: Calculate BMI using the provided formula

Step 11: Categorize the BMI into one of four categories

Step 12: Write the user's data (name, age, height, weight, BMI, BMI category) to the CSV file

Step 13: Repeat the loop until the user chooses to exit

Step 14: Display a message indicating that the data has been written to the CSV file

Step 15: Stop

Program:

import csv

csv_file_path = 'data_with_bm.csv'

# Open the CSV file for writing

with open(csv_file_path, 'w', newline='') as csvfile:
fieldnames = ['Name', 'Age', 'Height', 'Weight', 'BMI', 'BMI_Category']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

# Write header
writer.writeheader()
Output:

Enter your name (or type 'exit' to finish): akalya

Enter your age: 20

Enter your height in meters: 1.70

Enter your weight in kilograms: 54

Enter your name (or type 'exit' to finish): nandhini

Enter your age: 20

Enter your height in meters: 1.56

Enter your weight in kilograms: 45

Enter your name (or type 'exit' to finish): priya

Enter your age: 34

Enter your height in meters: 1.55

Enter your weight in kilograms: 22

Enter your name (or type 'exit' to finish): vinu

Enter your age: 20

Enter your height in meters: 1.78

Enter your weight in kilograms: 170

Enter your name (or type 'exit' to finish): exit

Data has been written to data_with_bm.csv

while True:
# Get user input for each entry
name = input("Enter your name (or type 'exit' to finish): ")

if name.lower() == 'exit':
break

age = int(input("Enter your age: "))

height = float(input("Enter your height in meters: "))
weight = float(input("Enter your weight in kilograms: "))

if height <= 0 or weight <= 0:

print("Height and weight must be positive numbers. Please try again.")
continue

# Calculate BMI
bmi = weight / (height ** 2)

# Categorize BMI
if bmi < 18.5:
bmi_category = 'Underweight'
elif 18.5 <= bmi < 24.9:
bmi_category = 'Normal weight'
elif 24.9 <= bmi < 29.9:
bmi_category = 'Overweight'
else:
bmi_category = 'Obese'

# Write user data to CSV

writer.writerow({'Name': name, 'Age': age, 'Height': height, 'Weight': weight, 'BMI': bmi,
'BMI_Category': bmi_category})

print(f"Data has been written to {csv_file_path}.")

Result:

Thus the categorical dataset using numpy, pandas package in python has been created successfully.
Experiment DATASET CREATION
no:1-b CONTINUOUS DATASET
Date:

Aim:
The Objective of the experiment is to create a categorical dataset using numpy, pandas package
in python.

Algorithm:
Step 1: Start

Step 2: Import the `csv` module

Step 3: Define the file path for the CSV file

Step 4: Open the CSV file in write mode and define field names (Name, Age, Height, Weight, BMI)

Step 5: Write the header row to the CSV file

Step 6: Begin a loop to collect user data

Step 7: Prompt the user to enter their name

Step 8: Check if the user wants to exit

Step 9: If not, prompt the user to enter their age, height, and weight

Step 10: Calculate BMI using the formula: weight / (height ** 2)

Step 11: Write the user's data (name, age, height, weight, BMI) to the CSV file

Step 12: Repeat the loop until the user chooses to exit

Step 13: Display a message indicating that the data has been written to the CSV file

Step 14: Stop

PROGRAM:

import csv

csv_file_path = 'bmi.csv'

# Open the CSV file for writing

with open(csv_file_path, 'w', newline='') as csvfile:
fieldnames = ['Name', 'Age', 'Height', 'Weight', 'BMI']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
OUTPUT:

Enter your name (or type 'exit' to finish): akalya

Enter your age: 20

Enter your height in meters: 1.69

Enter your weight in kilograms: 55

Enter your name (or type 'exit' to finish): nandhini

Enter your age: 22

Enter your height in meters: 1.55

Enter your weight in kilograms: 34

Enter your name (or type 'exit' to finish): priya

Enter your age: 25

Enter your height in meters: 1.70

Enter your weight in kilograms: 110

Enter your name (or type 'exit' to finish): exit

Data has been written to bmi.csv

writer.writeheader()

while True:
# Get user input for each entry
name = input("Enter your name (or type 'exit' to finish): ")

if name.lower() == 'exit':
break

age = int(input("Enter your age: "))

height = float(input("Enter your height in meters: "))
weight = float(input("Enter your weight in kilograms: "))

# Calculate BMI
bmi = weight / (height ** 2)

# Write user data to CSV

writer.writerow({'Name': name, 'Age': age, 'Height': height, 'Weight': weight, 'BMI': bmi})

print(f"Data has been written to {csv_file_path}.")

RESULT:

Thus the python program to create dataset is executed and verified successfully.
Experiment no:2-a SIMPLE LINEAR REGRESSION USING
Date: LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement simple linear regression using library functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries

Step 3: Load dataset and split into training and testing sets

Step 4: Train Linear Regression model

Step 5: Make predictions and define a threshold for binary classification

Step 6: Evaluate model performance

Step 7: Prompt user for input (years of experience)

Step 8: Predict salary and convert to binary label

Step 9: Print predicted salary

Step 10: Plot regression line with data points and threshold

Step 11: Stop

Program:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, precision_recall_fscore_support,
accuracy_score
import matplotlib.pyplot as plt

# Load the dataset from CSV file

file_path = "Salary_Data.csv" # Update this with your CSV file path
data = pd.read_csv(file_path)

# Assuming 'years_of_experience' is the feature and 'salary' is the target variable

X = data[['YearsExperience']].values
y = data['Salary'].values
SAMPLE DATASET:

OUTPUT:

Precision: 1.0

Recall: 0.6666666666666666

F1 Score: 0.8

Accuracy: 0.8333333333333334

Enter years of experience:4

Predicted Salary: 63016.84430390072

Text(0.5, 1.0, 'Linear Regression: Years of Experience vs Salary')

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training the linear regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions on the test set

y_pred = model.predict(X_test)

# Set the threshold for binary classification

threshold = 80000 # Adjust this threshold as needed

# Convert predictions to binary labels based on the threshold

y_pred_binary = (y_pred > threshold).astype(int)
y_test_binary = (y_test > threshold).astype(int)

# Evaluate the model using classification metrics

precision, recall, f1, _ = precision_recall_fscore_support(y_test_binary, y_pred_binary,
average='binary')
accuracy = accuracy_score(y_test_binary, y_pred_binary)

print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Accuracy:", accuracy)

# Get input from the user for Years of Experience

years_of_experience = float(input("Enter years of experience:"))

# Create a NumPy array with the user input

years_of_experience_array = np.array([[years_of_experience]])

# Predict salary using the trained linear regression model

predicted_salary = model.predict(years_of_experience_array)

# Convert the predicted salary to binary label based on the threshold

predicted_salary_binary = (predicted_salary > threshold).astype(int)

print("Predicted Salary:", predicted_salary[0])

# Plotting the regression line along with test data points

plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.axhline(y=threshold, color='red', linestyle='--', label='Threshold')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.title('Linear Regression: Years of Experience vs Salary')

Result:

Thus the simple linear Regression with built in functionhas been implemented and executed
successfully.
Experiment no:2-b SIMPLE LINEAR REGRESSION WITHOUT
Date: USING LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement simple linear regression without using library
functions.

Algorithm:
Step 1: Start

Step 2: Import the matplotlib.pyplot library

Step 3: Load the dataset from the CSV file specified in 'file_path'

Step 4: Read the CSV file without using pandas, extracting lines of data

Step 5: Extract data from CSV and format it as a list of lists

Step 6: Define features ('years_of_experience') and target variable ('salary')

Step 7: Split the data into training and testing sets

Step 8: Train the linear regression model using analytical solution

Step 9: Make predictions on the test set

Step 10: Evaluate the model's performance using Mean Squared Error (MSE) and R-squared (R^2)
Score

Step 11: Prompt the user to enter years of experience for predicting salary

Step 12: Predict salary using the trained linear regression model

Step 13: Print the predicted salary

Step 14: Plot the regression line along with test data points

Step 15: Show the plot

Step 16: Stop

Program:
import matplotlib.pyplot as plt

# Load the dataset from CSV file

file_path = "Salary_Data.csv" # Update this with your CSV file path

# Read the CSV file without using pandas

with open(file_path, 'r') as file:
lines = file.readlines()

# Extract data from CSV

data = [list(map(float, line.strip().split(','))) for line in lines[1:]] # Skip header

# Assuming 'years_of_experience' is the feature and 'salary' is the target variable

X = [[1, row[0]] for row in data] # Add a bias term
y = [row[1] for row in data]

# Splitting the data into training and testing sets

split_index = int(0.8 * len(data))
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Training the linear regression model

mean_X1 = sum(x[1] for x in X_train) / len(X_train)
mean_y = sum(y_train) / len(y_train)

b1 = sum((x[1] - mean_X1) * (y_train[i] - mean_y) for i, x in enumerate(X_train)) / sum((x[1] -

mean_X1)**2 for x in X_train)
b0 = mean_y - b1 * mean_X1

# Making predictions on the test set

y_pred = [b0 + b1 * x[1] for x in X_test]

# Evaluating the model

n = len(y_test)
mse = sum((y_test[i] - y_pred[i])**2 for i in range(n)) / n
mean_y_test = sum(y_test) / n
ss_total = sum((y_test[i] - mean_y_test)**2 for i in range(n))
r2 = 1 - (mse / ss_total)

print("Mean Squared Error:", mse)

print("R^2 Score:", r2)

# Get input from the user for Hours Studied

years_of_experience = float(input("Enter year of experience: "))

# Predict exam scores using the trained linear regression model

predicted_salary = b0 + b1 * years_of_experience

print("Predicted salary:", predicted_salary)

# Plotting the regression line along with test data points

plt.scatter([x[1] for x in X_test], y_test, color='black')
SAMPLE DATASET:

Output:
Mean Squared Error: 35766738.2396581
R^2 Score: 0.8450481599189925
Enter year of experience: 4
Predicted salary: 63870.993342864815

plt.plot([x[1] for x in X_test], y_pred, color='blue', linewidth=3)

plt.xlabel('year of experience')
plt.ylabel('salary')
plt.title('Linear Regression: year of experience vs salary')
plt.show()

Result:
Thus the simple linear Regression without built in functionhas been implemented and executed
successfully.
Experiment no:2-c MULTI LINEAR REGRESSION USING
Date: LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement Multi linear regression using library
functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries

Step 3: Load dataset from CSV file

Step 4: Split data into training and testing sets

Step 5: Train Linear Regression model

Step 6: Make predictions on the test set

Step 7: Evaluate model performance

Step 8: Prompt user for input (TV, Radio, Newspaper budgets)

Step 9: Predict sales using the trained model

Step 10: Print predicted sales

Step 11: Plot actual vs predicted sales

Step 12: Stop

# Load the dataset from CSV file

file_path = "Advertising.csv" # Update this with your CSV file path
data = pd.read_csv(file_path)

# Assuming 'TV', 'Radio', and 'Newspaper' are features and 'Sales' is the target variable
X = data[['TV', 'Radio', 'Newspaper']].values
y = data['Sales'].values
SAMPLE DATASET:

Output:
Mean Squared Error: 2.9077569102710923
R^2 Score: 0.9059011844150826
Enter TV advertising budget: 6677
Enter Radio advertising budget: 6488
Enter Newspaper advertising budget: 3566
Predicted Sales: 1039.0705215552014
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training the linear regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions on the test set

y_pred = model.predict(X_test)

# Evaluating the model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R^2 Score:", r2)

# Get input from the user for Radio, TV, and Newspaper advertising budgets
tv_budget = float(input("Enter TV advertising budget: "))
radio_budget = float(input("Enter Radio advertising budget: "))
newspaper_budget = float(input("Enter Newspaper advertising budget: "))

# Create a NumPy array with the user input

user_input = np.array([[tv_budget, radio_budget, newspaper_budget]])

# Predict sales using the trained linear regression model

predicted_sales = model.predict(user_input)

print("Predicted Sales:", predicted_sales[0])

# Plotting actual vs predicted sales on the test set

plt.scatter(y_test, y_pred, color='blue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linewidth=3)
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()

Result:
Thus the multiple linear Regression with built in function has been implemented and executed
successfully.
Experiment no:2-d MULTI LINEAR REGRESSION WITHOUT
Date: USING LIBRARY FUNCTION

Aim:
The Objective of this experiment is to implement Multi linear regression without using library
functions.

Algorithm:
Step 1: Start

Step 2: Import the matplotlib.pyplot library

Step 3: Load the dataset from the CSV file specified in 'file_path'

Step 4: Read the CSV file without using pandas, extracting lines of data

Step 5: Extract data from CSV and format it as a list of lists

Step 6: Define features ('TV', 'Radio', 'Newspaper') and target variable ('Sales')

Step 7: Split the data into training and testing sets

Step 8: Create polynomial features for training and testing sets

Step 9: Train the linear regression model with polynomial features

Step 10: Make predictions on the test set with polynomial features

Step 11: Evaluate the model's performance using Mean Squared Error (MSE) and R-squared (R^2)
Score

Step 12: Prompt the user to enter advertising budgets for TV, Radio, and Newspaper

Step 13: Predict sales using the trained model

Step 14: Print the predicted sales

Step 15: Plot actual vs predicted sales on the test set

Step 16: Stop

Program:
import matplotlib.pyplot as plt

# Load the dataset from CSV file

file_path = "Advertising.csv" # Update this with your CSV file path

# Read the CSV file without using pandas

with open(file_path, 'r') as file:
lines = file.readlines()

# Extract data from CSV

data = []
for line in lines[1:]: # Skip the header line
values = line.strip().split(',')
data.append([float(values[0]), float(values[1]), float(values[2]), float(values[3])])

# Assuming 'TV', 'Radio', and 'Newspaper' are features and 'Sales' is the target variable
X = [[row[0], row[1], row[2]] for row in data] # Features: TV, Radio, Newspaper
y = [row[3] for row in data] # Target variable: Sales

# Splitting the data into training and testing sets

split_index = int(0.8 * len(data))
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Create polynomial features

X_poly_train = [[1, x[0], x[1], x[2], x[0] * x[0], x[1] * x[1], x[2] * x[2]] for x in X_train]
X_poly_test = [[1, x[0], x[1], x[2], x[0] * x[0], x[1] * x[1], x[2] * x[2]] for x in X_test]

# Training the linear regression model with polynomial features

X_poly_train_transposed = list(map(list, zip(*X_poly_train)))
mean_X_poly = [sum(feature) / len(feature) for feature in X_poly_train_transposed]
mean_y_poly = sum(y_train) / len(y_train)

# Add a small constant to avoid division by zero

epsilon = 1e-10

# Compute coefficients with added constant

coefficients_poly = [
sum((X_poly_train[i][j] - mean_X_poly[j]) * (y_train[i] - mean_y_poly) for i in
range(len(X_poly_train))) /
(sum((X_poly_train[i][j] - mean_X_poly[j])**2 for i in range(len(X_poly_train))) + epsilon)
for j in range(len(mean_X_poly))
]

# Making predictions on the test set with polynomial features

y_pred_poly = [sum(coefficients_poly[k] * x[k] for k in range(len(coefficients_poly))) for x in
X_poly_test]

# Evaluating the model with polynomial features

n = len(y_test) # Define 'n' as the length of the test set
mse_poly = sum((y_test[i] - y_pred_poly[i])**2 for i in range(n)) / n
mean_y_test_poly = sum(y_test) / n
ss_total_poly = sum((y_test[i] - mean_y_test_poly)**2 for i in range(n))
r2_poly = 1 - (mse_poly / ss_total_poly)

print("Mean Squared Error (Polynomial Regression):", mse_poly)

print("R^2 Score (Polynomial Regression):", r2_poly)
SAMPLE DATASET:

Output:
Mean Squared Error (Polynomial Regression): 59.046372645941645
R^2 Score (Polynomial Regression): 0.9454034508470807
Enter TV advertising budget: 5466
Enter Radio advertising budget: 3123
Enter Newspaper advertising budget:3424
Predicted Sales: 34212.556222982384
# Get input from the user for Radio, TV, and Newspaper advertising budgets
tv_budget = float(input("Enter TV advertising budget: "))
radio_budget = float(input("Enter Radio advertising budget: "))
newspaper_budget = float(input("Enter Newspaper advertising budget:"))

# Predict sales using the trained linear regression model

user_input = [1, tv_budget, radio_budget, newspaper_budget, tv_budget * tv_budget, radio_budget
* radio_budget, newspaper_budget * newspaper_budget]
predicted_sales = sum(coefficients_poly[k] * user_input[k] for k in range(len(coefficients_poly)))

print("Predicted Sales:", predicted_sales)

sorted_indices = sorted(range(len(y_test)), key=lambda k: y_test[k]) # Sort based on the actual sales

values
y_test_sorted = [y_test[i] for i in sorted_indices]
y_pred_poly_sorted = [y_pred_poly[i] for i in sorted_indices]

plt.scatter(y_test_sorted, y_pred_poly_sorted, color='blue') # Scatter plot

plt.plot([min(y_test_sorted), max(y_test_sorted)], [min(y_test_sorted), max(y_test_sorted)],
color='red', linewidth=3) # Linear line
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()

Result:
Thus multiple linear regression without using inbuilt function has been implemented and executed
successfully.
Experiment no:2-e IMPLEMENTATION OF LOGISTIC
Date: REGRESSION

Aim:
The Objective of this experiment is to implement logistic regression using library functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, sklearn's train_test_split, LogisticRegression,

accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, LabelEncoder, and
matplotlib.pyplot

Step 3: Load the dataset from the CSV file 'optimal_features_dataset.csv' using pandas

Step 4: Drop the 'url' column from the dataset

Step 5: Encode the categorical variable 'status' using LabelEncoder

Step 6: Drop the original 'status' column and separate features (X) and target variable (y)

Step 7: Split the dataset into training and testing sets

Step 8: Initialize Logistic Regression model

Step 9: Train the model on the training set

Step 10: Predict on the test set

Step 11: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 12: Plot the distribution of the ratio of digits in the URL for legitimate and phishing URLs

Step 13: Show the plot

Step 14: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('optimal_features_dataset.csv')
df.drop('url', axis=1, inplace=True)
# Encoding categorical variable 'status'
label_encoder = LabelEncoder()
df['status_encoded'] = label_encoder.fit_transform(df['status'])

# Drop the original 'status' column

df.drop('status', axis=1, inplace=True)

# Separate features and target variable

X = df.drop('status_encoded', axis=1)
y = df['status_encoded']

# Split dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Logistic Regression model

model = LogisticRegression()

# Train the model

model.fit(X_train, y_train)

# Predict on the test set

y_pred = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)
df.head()

plt.figure(figsize=(8, 6))
legitimate_data = df[df['status_encoded'] == 0]
phishing_data = df[df['status_encoded'] == 1]
plt.hist(legitimate_data['ratio_digits_url'], bins=20, alpha=0.5, label='Legitimate URLs')
plt.hist(phishing_data['ratio_digits_url'], bins=20, alpha=0.5, label='Phishing URLs')
plt.xlabel('Ratio of Digits in URL')
plt.ylabel('Frequency')
plt.title('Distribution of Ratio of Digits in URL')
plt.legend()
plt.show()
SAMPLE DATASET:

Output:
Accuracy: 0.6399825021872266
Precision: 0.7101648351648352
Recall: 0.4579273693534101
F1 Score: 0.5568120624663435
Confusion Matrix:
[[946 211]
[612 517]]
Result:
Thus the experiment is to implement logistic regression using library functions has been
implemented and executed successfully.
Experiment no:3-a NAIVE BAYES ALGORITHM WITH
Date: LIBRARY FUNCION

Aim:
The Objective of this experiment is to implement Naïve Bayes algorithm with library functions.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, numpy, sklearn's CategoricalNB, LabelEncoder,

accuracy_score, and classification_report

Step 3: Load dataset from CSV file 'naive.csv' using pandas

Step 4: Separate data into features (X) and labels (y)

Step 5: Define a custom function to encode boolean values as 1 or 0

Step 6: Use LabelEncoder for all features, including the boolean one, and encode categorical
variables

Step 7: Train a Naive Bayes classifier (CategoricalNB)

Step 8: Define a function to predict outcomes based on user input

Step 9: Prompt the user to input values for Outlook, Temperature, Humidity, and Windy

Step 10: Transform user input to numerical form using the trained LabelEncoders

Step 11: Predict the outcome based on user input using the trained classifier

Step 12: Print the predicted outcome

Step 13: Call the predict_outcome function

Step 14: Stop

Program:
import pandas as pd
import numpy as np
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report

# Load dataset from CSV file

df = pd.read_csv('naive.csv')

# Separate data into features (X) and labels

SAMPLE DATASET:

Output:
Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): Sunny
Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): Cool
Enter Humidity (e.g., 'High', 'Normal'): High
Enter Windy (True/False): False
Predicted Outcome: No
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Custom encoding function for boolean values

def encode_boolean(value):
return 1 if str(value).lower() == 'true' else 0

# Use LabelEncoder for all features, including the boolean one

label_encoders = [LabelEncoder() for _ in range(X.shape[1])]
X_encoded = [le.fit_transform(X[:, i]) if i != X.shape[1] - 1 else np.array([encode_boolean(val) for val
in X[:, i]]) for i, le in enumerate(label_encoders)]
X_encoded = np.array(list(zip(*X_encoded)))

# Train a Naive Bayes classifier

clf = CategoricalNB()
clf.fit(X_encoded, y)

# Function to predict based on user input

def predict_outcome():
outlook = input("Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): ").capitalize()
temperature = input("Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): ").capitalize()
humidity = input("Enter Humidity (e.g., 'High', 'Normal'): ").capitalize()
windy = input("Enter Windy (True/False): ").capitalize()

# Transform user input to numerical form using the trained LabelEncoders

user_input_encoded = [le.transform([val])[0] if i != X.shape[1] - 1 else encode_boolean(val) for i,
(le, val) in enumerate(zip(label_encoders, [outlook, temperature, humidity, windy]))]
user_input_encoded = np.array(user_input_encoded).reshape(1, -1)

# Predict the outcome

prediction = clf.predict(user_input_encoded)[0]
print(f'Predicted Outcome: {prediction}')

# Call the predict_outcome function

predict_outcome()

Result:
Thus the Naive Bayes algorithm with library functions has been executed and implemented
successfully.
Experiment no:3-b NAIVE BAYES ALGORITHM WITHOUT
Date: USING LIBRARY FUNCION

Aim:
The Objective of this experiment is to implement Naïve Bayes algorithm without using library
functions.

Algorithm:
1. Start

2. Import pandas and numpy.

3. Read the dataset from the CSV file 'naive2.csv' using pandas.

4. Calculate the count of each class in the target variable 'Play Golf'.

5. Calculate the conditional probabilities for each attribute given the target variable.

6. Define a function `predict(X)` to predict the target variable based on the input attributes.

7. Prompt the user to input values for Outlook, Temperature, Humidity, and Windy.

8. Predict the outcome based on the user input using the `predict` function.

9. Define a function `accuracy()` to calculate the accuracy of the model.

10. Print the predicted outcome and accuracy.

11. Stop.

Program:
import pandas as pd
import numpy as np

df = pd.read_csv('naive2.csv')
#df = df.drop(df.columns[df.columns.str.contains('Unnamed')], axis=1)
op = dict(df['Play Golf'].value_counts())
rows = df.shape[0]
prob = {}
for column in df.columns[:-1]:
column_d = {}
for class_ in df[column].unique():
t_d = {}
for output in df['Play Golf'].unique():
occurrences = ((df[column] == class_) & (df['Play Golf'] == output)).sum()
t_d[output] = occurrences/op[output]
prob[str(class_)] = t_d
SAMPLE DATASET:

Output:
Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): Sunny
Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): Mild
Enter Humidity (e.g., 'High', 'Normal'): High
Enter Windy (True/False): True
No
5
13
None
dataset:
Outlook Temperature Humidity Windy Play Golf
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes
5 Rainy Cool Normal True No
6 Overcast Cool Normal True Yes
7 Sunny Mild High False No
8 Sunny Cool Normal False Yes
9 Rainy Mild Normal False Yes
10 Sunny Mild Normal True Yes
11 Overcast Mild High True Yes
12 Overcast Hot Normal False Yes
13 Rainy Mild High True No
prob_df = pd.DataFrame(prob)
def predict(X):
p_yes = op['Yes']/rows
p_no = op['No']/rows
for attr in X:
p_yes *= prob_df[attr]['Yes']
p_no *= prob_df[attr]['No']
P_yes = round(p_yes/(p_yes+p_no),2)
P_no = round(p_no/(p_yes+p_no),2)

return "Yes" if P_yes >= P_no else "No"

outlook = input("Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): ").capitalize()

temperature = input("Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): ").capitalize()
humidity = input("Enter Humidity (e.g., 'High', 'Normal'): ").capitalize()
windy = input("Enter Windy (True/False): ").capitalize()
print(predict([outlook,temperature,humidity,windy]))

def accuracy():
count = 0
for index,row in df.iterrows():
r = list(map(str,row))
if predict(r[:-1]) == r[-1]:
count += 1
else:
print(index)
print(count)
print(accuracy())

print("dataset: ")
print(df)

Result:
Thus the Naive Bayes algorithm without library functions has been executed and implemented
successfully.
Experiment no:4-a IMPLEMENTATION OF DECISION TREE
Date: USING ID3 ALGORITHM

Aim:
The Objective of this experiment is to implement decision tree using ID3 algorithm with inbuilt
libraries in python.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, DecisionTreeClassifier, train_test_split, accuracy_score,

precision_score, recall_score, f1_score, and confusion_matrix

Step 3: Load the dataset from '/content/optimal_feature.csv' using pandas

Step 4: Separate features (X) and target variable (y)

Step 5: Split the dataset into train and test sets

Step 6: Initialize a DecisionTreeClassifier with criterion='entropy' for ID3

Step 7: Train the model using the training data

Step 8: Make predictions on the test set

Step 9: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 10: Print the evaluation metrics

Step 11: Stop

Program:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Separate features and target variable

X = df.drop('status', axis=1)
y = df['status']

# Split dataset into train and test sets

SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize DecisionTreeClassifier with criterion='entropy' for ID3

model = DecisionTreeClassifier(criterion='entropy')

# Train the model

model.fit(X_train, y_train)

# Predict on the test set

y_pred = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

# Calculate precision
precision = precision_score(y_test, y_pred, average='weighted')

# Calculate recall
recall = recall_score(y_test, y_pred, average='weighted')

# Calculate F1-score
f1 = f1_score(y_test, y_pred, average='weighted')

# Calculate confusion matrix

conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

Result:
Thus the decision tree using ID3 algorithm with inbuilt libraries in python has been implemented and
executed successfully.
Experiment no:4-b IMPLEMENTATION OF DECISION TREE
Date: USING C4.5 ALGORITHM

Aim:
The Objective of this experiment is to implement decision tree using C4.5 algorithm with inbuilt
libraries in python.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, train_test_split, DecisionTreeClassifier, accuracy_score,

precision_score, recall_score, f1_score, confusion_matrix, and numpy

Step 3: Define a function to calculate entropy

Step 4: Define a function to calculate information gain

Step 5: Define a function to find the best split based on information gain

Step 6: Load the dataset from '/content/optimal_feature.csv' using pandas

Step 7: Separate features (X) and target variable (y)

Step 8: Handle missing values by replacing NaN with a placeholder value ('missing')

Step 9: Convert categorical features to numeric using one-hot encoding

Step 10: Split the dataset into train and test sets

Step 11: Initialize a DecisionTreeClassifier with criterion='entropy' for the C4.5-like algorithm

Step 12: Train the model using the training data

Step 13: Make predictions on the test set

Step 14: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 15: Print the evaluation metrics

Step 16: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
import numpy as np

# Function to calculate entropy

def entropy(labels):
unique_labels, label_counts = np.unique(labels, return_counts=True)
probabilities = label_counts / len(labels)
entropy = -np.sum(probabilities * np.log2(probabilities))
return entropy

# Function to calculate information gain

def information_gain(data, feature_index, labels):
# Calculate entropy before splitting
entropy_before_split = entropy(labels)

# Calculate weighted sum of entropies after splitting

unique_values = np.unique(data[:, feature_index])
entropy_after_split = 0
for value in unique_values:
subset_indices = np.where(data[:, feature_index] == value)
subset_labels = labels[subset_indices]
subset_entropy = entropy(subset_labels)
weight = len(subset_labels) / len(labels)
entropy_after_split += weight * subset_entropy

# Calculate information gain

information_gain = entropy_before_split - entropy_after_split
return information_gain

# Function to find the best split based on information gain

def find_best_split(data, labels):
num_features = data.shape[1]
best_gain = -1
best_feature = None
for feature_index in range(num_features):
gain = information_gain(data, feature_index, labels)
if gain > best_gain:
best_gain = gain
best_feature = feature_index
return best_feature

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Separate features and target variable

X = df.drop(columns=['status']) # Drop the 'status' column
y = df['status']

# Handle missing values (replace NaN with a placeholder value)

X.fillna('missing', inplace=True)

# Convert categorical features to numeric using one-hot encoding

SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X = pd.get_dummies(X)

# Split dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize DecisionTreeClassifier with criterion='entropy' for C4.5-like algorithm

model = DecisionTreeClassifier(criterion='entropy')

# Train the model

model.fit(X_train, y_train)

# Predict on the test set

y_pred = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

Result:
Thus the decision tree using C4.5 algorithm with inbuilt libraries in python has been implemented
and executed successfully.
Experiment no:4-c IMPLEMENTATION OF DECISION TREE
Date: USING CART ALGORITHM

Aim:
The Objective of this experiment is to implement decision tree using CART algorithm with inbuilt
libraries in python.

Algorithm:
Step 1: Start

Step 2: Import necessary libraries: pandas, train_test_split, DecisionTreeClassifier, accuracy_score,

precision_score, recall_score, f1_score, and confusion_matrix

Step 3: Load the dataset from '/content/optimal_feature.csv' using pandas

Step 4: Separate features (X) and target variable (y)

Step 5: Split the dataset into train and test sets

Step 6: Initialize a DecisionTreeClassifier with criterion='gini' for the CART algorithm

Step 7: Train the model using the training data

Step 8: Make predictions on the test set

Step 9: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix

Step 10: Print the evaluation metrics

Step 11: Stop

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Separate features and target variable

X = df.drop('status', axis=1)
y = df['status']

# Split dataset into train and test sets

SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize DecisionTreeClassifier with criterion='gini' for CART algorithm

model = DecisionTreeClassifier(criterion='gini')

# Train the model

model.fit(X_train, y_train)

# Predict on the test set

y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

Result:
Thus the decision tree using cart algorithm with inbuilt libraries in python has been implemented
and executed successfully.
Experiment no:5 IMPLEMENTATION OF K-MEANS
Date: CLUSTERING

Aim:
The Objective of this experiment is to implement K means clustering algorithm with
inbuilt libraries in python.

Algorithm:
Here's a concise breakdown of the provided code:

Step 1: Start

Step 2: Import necessary libraries: numpy, matplotlib.pyplot, datasets from sklearn, KMeans from
sklearn.cluster, silhouette_score, davies_bouldin_score, and calinski_harabasz_score from
sklearn.metrics

Step 3: Load the Iris dataset using datasets.load_iris()

Step 4: Separate features (X) and target labels (y)

Step 5: Initialize KMeans clustering algorithm with 3 clusters

Step 6: Fit KMeans to the data

Step 7: Get the cluster centers and labels

Step 8: Plot the clusters and centroids

Step 9: Calculate and print the silhouette score

Step 10: Calculate and print the Davies–Bouldin index

Step 11: Calculate and print the Calinski-Harabasz score

Step 12: Stop

Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score, calinski_harabasz_score

# Load the Iris dataset

iris = datasets.load_iris()
X = iris.data # Features
y = iris.target # Target labels

# Initialize KMeans with 3 clusters (since there are 3 types of flowers)

kmeans = KMeans(n_clusters=3)

# Fit KMeans to the data

kmeans.fit(X)

# Get the cluster centers and labels

cluster_centers = kmeans.cluster_centers_
cluster_labels = kmeans.labels_

# Plot the clusters

plt.figure(figsize=(12, 8))

# Define marker shapes and colors for each cluster

markers = ['o', 's', '^']
colors = ['blue', 'green', 'red']

# Plot the data points with colors corresponding to the clusters

for i in range(len(X)):
plt.scatter(X[i, 0], X[i, 1], color=colors[cluster_labels[i]], marker=markers[cluster_labels[i]])

# Plot the centroids of the clusters

plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], color='black', marker='x', s=100,
label='Centroids')

plt.xlabel('Sepal Length (cm)')

plt.ylabel('Sepal Width (cm)')
plt.title('K-means Clustering of Iris Dataset')

# Add legend for cluster labels

for i, label in enumerate(iris.target_names):
plt.scatter([], [], color=colors[i], marker=markers[i], label=label)

plt.legend()
plt.show()

# Calculate silhouette score

silhouette_avg = silhouette_score(X, cluster_labels)
print("Silhouette Score:", silhouette_avg)

# Calculate Davies–Bouldin index

db_index = davies_bouldin_score(X, cluster_labels)
print("Davies–Bouldin Index:", db_index)

# Calculate Calinski-Harabasz score

ch_score = calinski_harabasz_score(X, cluster_labels)
print("Calinski-Harabasz Score:", ch_score)
Output:

Silhouette Score: 0.5528190123564095

Davies–Bouldin Index: 0.6619715465007465
Calinski-Harabasz Score: 561.62775662962
Result:
Thus the k means clustering algorithm using inbuilt library functions has been implemented and
executed successfully.
Experiment no:6-a SUPPORT VECTOR MACHINE
Date: SUPPORT VECTOR CLASSIFICATION

Aim:
The Objective of this experiment is to implement the Support Vector Classification using inbuilt
libraries in python.
Algorithm:
Step 1: Import necessary libraries including pandas, train_test_split, SVC, accuracy_score,
precision_score, recall_score, f1_score, confusion_matrix from sklearn, and matplotlib.pyplot.
Step 2: Load the dataset from the specified CSV file.
Step 3: Encode the categorical variable 'status' using LabelEncoder.
Step 4: Drop the original 'status' column from the DataFrame.
Step 5: Separate features (X) and the target variable (y).
Step 6: Split the dataset into training and testing sets using train_test_split.
Step 7: Initialize an SVC classifier with a linear kernel and regularization parameter C=1.0.
Step 8: Train the SVC classifier using the training data.
Step 9: Predict the target variable for the test dataset.
Step 10: Evaluate the model using accuracy, precision, recall, F1-score, and confusion matrix.
Step 11: Plot the confusion matrix.
Step 12: Train an SVM model with a linear kernel and regularization parameter C=1.0 (repeated for
demonstration).
Step 13: Make predictions on the testing set (repeated for demonstration).
Step 14: Print the confusion matrix (repeated for demonstration).
Step 15: Calculate accuracy, precision, recall, and F1-score (repeated for demonstration).
Step 16: Define a function to plot the decision boundary for an SVM model.
Step 17: Plot the decision boundary for the first two features of the dataset.
Step 18: Stop.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import LabelEncoder

# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')

# Encoding categorical variable 'status'

label_encoder = LabelEncoder()
df['status_encoded'] = label_encoder.fit_transform(df['status'])
# Drop the original 'status' column
df.drop('status', axis=1, inplace=True)

# Separate features and target variable

X = df.drop('status_encoded', axis=1)
y = df['status_encoded']

# Split dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features if needed

# Create SVC classifier object

svm_classifier = SVC(kernel='linear', C=1.0) # Adjust parameters as needed

# Train SVC classifier

svm_classifier.fit(X_train, y_train)

# Predict the response for test dataset

y_pred = svm_classifier.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)

# Plot confusion matrix

plt.figure(figsize=(8, 6))
plt.imshow(conf_matrix, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.colorbar()
classes = ['Legitimate', 'Phishing']
plt.xticks([0, 1], classes)
plt.yticks([0, 1], classes)
plt.ylabel('Actual')
plt.xlabel('Predicted')
for i in range(len(conf_matrix)):
for j in range(len(conf_matrix[i])):
plt.text(j, i, str(conf_matrix[i][j]), horizontalalignment='center', color='black')
plt.show()

# Step 6: Train an SVM model with a linear kernel and regularization parameter C=1.0
svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(X_train, y_train)

# Step 7: Make predictions on the testing set

y_pred = svm_model.predict(X_test)

# Step 8: Print the confusion matrix

conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Step 9: Calculate accuracy, precision, recall, and F1-score

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print accuracy, precision, recall, and F1-score

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)

# Step 10: Plot the SVM decision boundary

def plot_decision_boundary(X, y, model):
# Get support vectors and their corresponding labels
support_vectors = model.support_vectors_
y_reset_index = y.reset_index(drop=True) # Reset index to ensure alignment
support_vectors_labels = y_reset_index[model.support_]

# Plot data points for each class with different colors

plt.scatter(X[y == 0, 0], X[y == 0, 1], c='blue', label='Non-malicious', cmap=plt.cm.coolwarm, s=20,
edgecolors='k')
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='red', label='Malicious', cmap=plt.cm.coolwarm, s=20,
edgecolors='k')

# Plot support vectors

plt.scatter(support_vectors[:, 0], support_vectors[:, 1], s=100, linewidth=1, facecolors='none',
edgecolors='k')

# Create mesh grid to plot decision boundary

ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50),
np.linspace(ylim[0], ylim[1], 50))
Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])

# Plot decision boundary and margins

Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]

Confusion Matrix:
[[2 0]
[0 1]]
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-score: 1.0
<ipython-input-16-b1a6a5941cd6>:98: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='blue', label='Non-malicious',
cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:99: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='red', label='Malicious',
cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:117: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[model.decision_function(X) > 0][:, 0],
X[model.decision_function(X) > 0][:, 1], c='green', label='Above Positive
Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:118: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[model.decision_function(X) < 0][:, 0],
X[model.decision_function(X) < 0][:, 1], c='orange', label='Below Negative
Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')

# Plot data points above positive hyperplane and below negative hyperplane
plt.scatter(X[model.decision_function(X) > 0][:, 0], X[model.decision_function(X) > 0][:, 1],
c='green', label='Above Positive Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
plt.scatter(X[model.decision_function(X) < 0][:, 0], X[model.decision_function(X) < 0][:, 1],
c='orange', label='Below Negative Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')

plt.title('SVM Decision Boundary')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
# For illustration purpose, we'll plot decision boundary for the first two features
X_train_2d = X_train.iloc[:, :2].values # Convert DataFrame to NumPy array
svm_model_2d = SVC(kernel='linear', C=1.0)
svm_model_2d.fit(X_train_2d, y_train)

# Plot decision boundary

plot_decision_boundary(X_train_2d, y_train, svm_model_2d)

Result:
Thus the Support Vector Classification using inbuilt libraries in python has been implemented and
executed successfully.
Experiment no:6-b SUPPORT VECTOR MACHINE
Date: SUPPORT VECTOR REGRESSION

Aim:
The Objective of this experiment is to implement the Support Vector Regression using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Load necessary libraries
Step 3: Load dataset
Step 4: Separate features and target variable
Step 5: Split dataset into train and test sets
Step 6: Create SVR model object
Step 7: Train SVR model
Step 8: Predict response for test dataset
Step 9: Evaluate model (calculate MSE and R^2)
Step 10: Plot actual vs. predicted values
Step 11: Stop
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('dress_sales_prediction.csv')

# Separate features and target variable

X = df.drop('sales', axis=1)
y = df['sales']

# Split dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create SVR model object

svr_model = SVR(kernel='linear', C=1.0) # Adjust parameters as needed

# Train SVR model

svr_model.fit(X_train, y_train)

# Predict the response for test dataset

y_pred = svr_model.predict(X_test)

# Evaluate the model

SAMPLE DATASET:

Output:
Mean Squared Error: 8.043392151715722
R^2 Score: 0.9574175861521746
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R^2 Score:", r2)

# Plot actual vs. predicted values

plt.scatter(y_test, y_pred)
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.title("Actual vs. Predicted values (SVR)")
plt.show()

Result:
Thus the Support Vector Regression using inbuilt libraries in python has been implemented and
executed successfully
Experiment no:7-a ENSEMBLE LEARNING
Date: STACKING

Aim:
The Objective of this experiment is to implement stacking using inbuilt libraries in python.
Algorithm:
Step 1: Start
Step 2: Load necessary libraries
Step 3: Load dataset
Step 4: Separate features and target variable
Step 5: Split dataset into train and test sets
Step 6: Create SVR model object
Step 7: Train SVR model
Step 8: Predict response for test dataset
Step 9: Evaluate model (calculate MSE and R^2)
Step 10: Plot actual vs. predicted values
Step 11: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder

# Load your dataset

data = pd.read_csv('optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column

data.drop('status', axis=1, inplace=True)

# Separate features and target variable

X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models

base_models = [
SAMPLE DATASET:

Output:
Stacking accuracy: 0.6666666666666666
Precision: 0.0
Recall: 0.0
F1 Score: 0.0

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning:
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to
control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Stacking accuracy: 0.3333333333333333
Precision: 0.3333333333333333
Recall: 1.0
F1 Score: 0.5

Stacking accuracy: 1.0

Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Stacking accuracy: 0.3333333333333333

Precision: 0.3333333333333333
Recall: 1.0
F1 Score: 0.5

Stacking accuracy: 1.0

Precision: 1.0
Recall: 1.0
F1 Score: 1.0
('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
('gb', GradientBoostingClassifier(n_estimators=100, random_state=42))
]

# Define meta-model
meta_model = LogisticRegression()

# Implement cross-validated stacking

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, val_idx in kfold.split(X_train, y_train):

train_fold_X, val_fold_X = X_train.iloc[train_idx], X_train.iloc[val_idx]
train_fold_y, val_fold_y = y_train.iloc[train_idx], y_train.iloc[val_idx]

# Create empty arrays to store meta-features

meta_train = pd.DataFrame()
meta_test = pd.DataFrame()

# Fit base models on the training fold and predict on the validation fold
for name, model in base_models:
cloned_model = clone(model)
cloned_model.fit(train_fold_X, train_fold_y)
meta_train[name] = cloned_model.predict_proba(val_fold_X)[:, 1]

# Fit the meta-model on the validation fold predictions

meta_model.fit(meta_train, val_fold_y)

# Fit base models on the entire training set and predict on the test set
for name, model in base_models:
cloned_model = clone(model)
cloned_model.fit(X_train, y_train)
meta_test[name] = cloned_model.predict_proba(X_test)[:, 1]

# Make predictions on the test set using the meta-model

meta_pred = meta_model.predict(meta_test)
print("Stacking accuracy:", accuracy_score(y_test, meta_pred))
print("Precision:", precision_score(y_test, meta_pred))
print("Recall:", recall_score(y_test, meta_pred))
print("F1 Score:", f1_score(y_test, meta_pred))
print()

Result:
Thus stalking has been implemented and executed successfully
Experiment no:7-b ENSEMBLE LEARNING
Date: BAGGING-MAX VOTE

Aim:
The Objective of this experiment is to implement Max Vote bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Take majority vote for each instance
Step 14: Calculate accuracy, precision, recall, and F1 score
Step 15: Print bagging with majority voting metrics: accuracy, precision, recall, and F1 score
Step 16: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder

# Load your dataset

data = pd.read_csv('/content/optimal_feature.csv')

# Encode categorical variable 'status'

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column

data.drop('status', axis=1, inplace=True)
# Separate features and target variable
X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base model

base_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create list to hold base models

models = []
# Number of base models to create
num_models = 5
# Train and clone base models
for i in range(num_models):
model = clone(base_model)
model.fit(X_train, y_train)
models.append(model)
# Make predictions on the test set using each base model
predictions = []
for model in models:
predictions.append(model.predict(X_test))
# Take majority vote
final_predictions = []
for i in range(len(X_test)):
votes = [prediction[i] for prediction in predictions]
final_predictions.append(max(set(votes), key=votes.count))
# Calculate accuracy
accuracy = accuracy_score(y_test, final_predictions)
precision = precision_score(y_test, final_predictions)
recall = recall_score(y_test, final_predictions)
f1 = f1_score(y_test, final_predictions)
print("Bagging with Majority Voting Metrics:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print()

Result:
Thus Bagging- max voting has been implemented and executed successfully
SAMPLE DATASET:
Output:
Bagging with Majority Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Experiment no:7-c ENSEMBLE LEARNING
Date: BAGGING-AVERAGE VOTTING

Aim:
The Objective of this experiment is to implement Average Voting bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Take probabilities of predictions for each base model
Step 14: Calculate average probabilities for each class across all base models
Step 15: Convert average probabilities to predicted labels by selecting the class with the highest
probability
Step 16: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 17: Print bagging with average voting metrics: accuracy, precision, recall, and F1 score
Step 18: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
import numpy as np

# Load your dataset

data = pd.read_csv('/content/optimal_feature.csv')

# Encode categorical variable 'status'

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column

data.drop('status', axis=1, inplace=True)

# Separate features and target variable

X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base model

base_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create list to hold base models

models = []

# Number of base models to create

num_models = 5

# Train and clone base models

for i in range(num_models):
model = clone(base_model)
model.fit(X_train, y_train)
models.append(model)

# Make predictions on the test set using each base model

predictions = []
for model in models:
predictions.append(model.predict_proba(X_test))

# Take average of probabilities

avg_probabilities = np.mean(predictions, axis=0)

# Convert probabilities to labels

final_predictions = np.argmax(avg_probabilities, axis=1)

# Calculate evaluation metrics

accuracy = accuracy_score(y_test, final_predictions)
precision = precision_score(y_test, final_predictions)
recall = recall_score(y_test, final_predictions)
f1 = f1_score(y_test, final_predictions)

# Print performance metrics

print("Bagging with Average Voting Metrics:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
Result:
Thus Bagging- avg voting has been implemented and executed successfully
SAMPLE DATASET:
Output:
Bagging with Average Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Experiment no:7-d ENSEMBLE LEARNING
Date: BAGGING-WEIGHTED AVERAGE

Aim:
The Objective of this experiment is to implement Average Voting bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Assign weights to each base model's predictions (example weights are provided)
Step 14: Calculate the weighted average of predictions from all base models
Step 15: Convert weighted average probabilities to predicted labels by selecting the class with the
highest probability
Step 16: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 17: Print bagging with weighted average voting metrics: accuracy, precision, recall, and F1 score
Step 18: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
import numpy as np

# Load your dataset

data = pd.read_csv('/content/optimal_feature.csv')

# Encode categorical variable 'status'

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
# Drop the original 'status' column
data.drop('status', axis=1, inplace=True)

# Separate features and target variable

X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base model

base_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create list to hold base models

models = []

# Number of base models to create

num_models = 5

# Train and clone base models

for i in range(num_models):
model = clone(base_model)
model.fit(X_train, y_train)
models.append(model)

# Make predictions on the test set using each base model

predictions = []
for model in models:
predictions.append(model.predict_proba(X_test))

# Assign weights to each base model's predictions

weights = [0.2, 0.3, 0.2, 0.1, 0.2] # Example weights (you can adjust these)
weighted_predictions = np.zeros_like(predictions[0])
for i, pred in enumerate(predictions):
weighted_predictions += weights[i] * pred

# Convert probabilities to labels

final_predictions = np.argmax(weighted_predictions, axis=1)

# Calculate evaluation metrics

accuracy = accuracy_score(y_test, final_predictions)
precision = precision_score(y_test, final_predictions)
recall = recall_score(y_test, final_predictions)
f1 = f1_score(y_test, final_predictions)

# Print performance metrics

print("Bagging with Weighted Average Voting Metrics:")
print("Accuracy:", accuracy)

SAMPLE DATASET:
Output:
Bagging with Weighted Average Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Result:
Thus Bagging- weighted average has been implemented and executed successfully
Experiment no:7-e ENSEMBLE LEARNING
Date: BOOSTING – ADAPTIVE BOOSTING

Aim:
The Objective of this experiment is to implement Adaptive boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into training and testing sets
Step 8: Initialize AdaBoost Classifier with Decision Tree as base estimator and 50 estimators
Step 9: Train the AdaBoost Classifier on the training set
Step 10: Make predictions on the testing set
Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 12: Print accuracy, precision, recall, and F1 score
Step 13: Print confusion matrix
Step 14: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Load your dataset

data = pd.read_csv('/content/optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column

data.drop('status', axis=1, inplace=True)

# Separate features and target variable

X = data.drop('status_encoded', axis=1)
y = data['status_encoded']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[2 0]
[0 1]]
# Initialize AdaBoost Classifier with Decision Tree as base estimator
adaboost_classifier = AdaBoostClassifier(n_estimators=50, random_state=42)

# Train the AdaBoost Classifier

adaboost_classifier.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = adaboost_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Result:
Thus adaptive boosting has been implemented and executed successfully.
Experiment no:7-f ENSEMBLE LEARNING
Date: BOOSTING – GRADIENT BOOSTING

Aim:
The Objective of this experiment is to implement Gradient boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into training and testing sets
Step 8: Initialize Gradient Boosting Classifier with 100 estimators
Step 9: Train the Gradient Boosting Classifier on the training set
Step 10: Make predictions on the testing set
Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 12: Print accuracy, precision, recall, and F1 score
Step 13: Print confusion matrix
Step 14: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Load your dataset

data = pd.read_csv('/content/optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])

# Drop the original 'status' column

data.drop('status', axis=1, inplace=True)

# Separate features and target variable

X = data.drop('status_encoded', axis=1)
y = data['status_encoded']
SAMPLE DATASET:

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[2 0]
[0 1]]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Gradient Boosting Classifier

gradient_boosting_classifier = GradientBoostingClassifier(n_estimators=100, random_state=42)

# Train the Gradient Boosting Classifier

gradient_boosting_classifier.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = gradient_boosting_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Result:
Thus gradient boosting has been implemented and executed successfully.
Experiment no:7-g ENSEMBLE LEARNING
Date: BOOSTING – CATEGORIAL BOOSTING

Aim:
The Objective of this experiment is to implement Categorical boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start

Step 2: Import necessary libraries

Step 3: Load dataset

Step 4: Encode categorical variable 'status'

Step 5: Drop the original 'status' column

Step 6: Separate features and target variable

Step 7: Split the dataset into training and testing sets

Step 8: Initialize CatBoostClassifier

Step 9: Train the CatBoostClassifier on the training set

Step 10: Make predictions on the testing set

Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)

Step 12: Print accuracy, precision, recall, and F1 score

Step 13: Print confusion matrix

Step 14: Stop

Program:
import pandas as pd
from sklearn.model_selection import train_test_split
import catboost as ctb
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Load your dataset

data = pd.read_csv('/content/optimal_feature.csv')

label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
SAMPLE DATASET:

Output:
Learning rate set to 0.001502
0: learn: 0.6922122 total: 1.08ms remaining: 1.08s
1: learn: 0.6908270 total: 6.1ms remaining: 3.04s
2: learn: 0.6898993 total: 7.48ms remaining: 2.48s
3: learn: 0.6888169 total: 9.22ms remaining: 2.29s
4: learn: 0.6869733 total: 12.2ms remaining: 2.43s
5: learn: 0.6856654 total: 14.6ms remaining: 2.42s
6: learn: 0.6844488 total: 16.2ms remaining: 2.29s
7: learn: 0.6834330 total: 16.9ms remaining: 2.09s
8: learn: 0.6817790 total: 19.4ms remaining: 2.14s
9: learn: 0.6802639 total: 19.9ms remaining: 1.97s
10: learn: 0.6789032 total: 23.8ms remaining: 2.14s
11: learn: 0.6772619 total: 24.6ms remaining: 2.02s
12: learn: 0.6762064 total: 25.8ms remaining: 1.96s
13: learn: 0.6751169 total: 27.3ms remaining: 1.92s
14: learn: 0.6741816 total: 28.9ms remaining: 1.9s
15: learn: 0.6732487 total: 31.2ms remaining: 1.92s
16: learn: 0.6719068 total: 32.5ms remaining: 1.88s
17: learn: 0.6702825 total: 33.6ms remaining: 1.83s
18: learn: 0.6689514 total: 34.6ms remaining: 1.79s
19: learn: 0.6677421 total: 42.3ms remaining: 2.07s
20: learn: 0.6665820 total: 44ms remaining: 2.05s
21: learn: 0.6657759 total: 45.4ms remaining: 2.02s
And so on…………………
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[1 0]
[0 3]]
# Drop the original 'status' column
data.drop('status', axis=1, inplace=True)

# Separate features and target variable

x = data.drop('status_encoded', axis=1)
y = data['status_encoded']

X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.25)

catb = ctb.CatBoostClassifier()
catb_model = catb.fit(X_train,Y_train)

# Make predictions on the testing set

Y_pred = catb_model.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)

precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(Y_test, Y_pred))

Result:
Thus categorical boosting has been implemented and executed successfully.
Experiment no:8 REINFORCEMNT LEARNING
Date: Q LEARNING ALGORITHM(Maze Game)

Aim:
The Objective of this experiment is to implement Q-learning algorithm of reinforcement
learning in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Define the `QLearningMaze` class with methods to initialize Q-learning parameters, choose
actions, update Q-values, train, take actions, and solve the maze
Step 4: Define the `MazeGame` class with methods to initialize the maze game, draw the maze, draw
the player, handle player movements, and check for game completion
Step 5: Create a custom maze function to get user input for maze dimensions and values
Step 6: Get custom maze input from the user
Step 7: Specify maze parameters such as width, height, and cell size
Step 8: Specify starting and ending positions in the maze
Step 9: Mark starting and ending positions in the maze
Step 10: Create the maze game window and run the game loop
Step 11: Stop
Program:
import numpy as np
import tkinter as tk
from tkinter import messagebox, simpledialog

class QLearningMaze:
def __init__(self, maze, start, end, learning_rate=0.1, discount_factor=0.9, exploration_rate=0.1,
epochs=1000):
self.maze = maze
self.start = start
self.end = end
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.exploration_rate = exploration_rate
self.epochs = epochs
self.q_values = np.zeros((len(maze), len(maze[0]), 4)) # Q-values for each action (up, down, left,
right)

def choose_action(self, state):

if np.random.uniform(0, 1) < self.exploration_rate:
# Explore: choose a random action
return np.random.choice(4)
else:
# Exploit: choose the action with the highest Q-value
return np.argmax(self.q_values[state[0], state[1]])
def update_q_values(self, state, action, reward, next_state):
# Bellman equation: Q(s, a) = Q(s, a) + α * (reward + γ * max(Q(s', a')) - Q(s, a))
self.q_values[state[0], state[1], action] += self.learning_rate * \
(reward + self.discount_factor * np.max(self.q_values[next_state[0], next_state[1]]) -
self.q_values[state[0], state[1], action])

def train(self):
for _ in range(self.epochs):
current_state = self.start
while current_state != self.end:
action = self.choose_action(current_state)
next_state, reward = self.take_action(current_state, action)
self.update_q_values(current_state, action, reward, next_state)
current_state = next_state

def take_action(self, state, action):

# Move in the chosen direction (action)
if action == 0: # Up
next_state = (max(0, state[0] - 1), state[1])
elif action == 1: # Down
next_state = (min(len(self.maze) - 1, state[0] + 1), state[1])
elif action == 2: # Left
next_state = (state[0], max(0, state[1] - 1))
elif action == 3: # Right
next_state = (state[0], min(len(self.maze[0]) - 1, state[1] + 1))

# Determine reward based on the next state

if next_state == self.end:
reward = 100
elif self.maze[next_state[0]][next_state[1]] == 1:
reward = -100
next_state = state # Stay in the same state if hitting a wall
else:
reward = -1

# If the agent is at the destination but can't move towards it, return to the start
if next_state == state and state == self.end:
next_state = self.start
reward = -100 # Penalize for being stuck at the destination

return next_state, reward

def solve(self):
current_state = self.start
path = [current_state]
while current_state != self.end:
action = np.argmax(self.q_values[current_state[0], current_state[1]])
next_state, _ = self.take_action(current_state, action)
path.append(next_state)
current_state = next_state
return path
class MazeGame:
def __init__(self, master, maze):
self.master = master
self.master.title("Maze Game")
self.canvas = tk.Canvas(master, width=maze_width, height=maze_height)
self.canvas.pack()
self.maze = maze
self.player_pos = (0, 0) # Starting position of the player
self.draw_maze()
self.player = None # To store player's shape ID
self.game_over = False

self.master.bind('<Up>', self.move_up)
self.master.bind('<Down>', self.move_down)
self.master.bind('<Left>', self.move_left)
self.master.bind('<Right>', self.move_right)

def draw_maze(self):
for y in range(len(self.maze)):
for x in range(len(self.maze[y])):
if self.maze[y][x] == 1:
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="black")
elif self.maze[y][x] == 0:
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="white")
elif self.maze[y][x] == 2: # Starting position
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="green")
elif self.maze[y][x] == 3: # Ending position
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="blue")

def draw_player(self):
if self.player:
self.canvas.delete(self.player)
player_x = self.player_pos[1] * cell_size
player_y = self.player_pos[0] * cell_size
self.player = self.canvas.create_oval(player_x, player_y, player_x + cell_size, player_y + cell_size,
fill="red")

def move_up(self, event):

if not self.game_over:
new_y = max(0, self.player_pos[0] - 1)
if self.maze[new_y][self.player_pos[1]] != 1:
self.player_pos = (new_y, self.player_pos[1])
self.update_player()
self.check_win()

def move_down(self, event):

if not self.game_over:
new_y = min(len(self.maze) - 1, self.player_pos[0] + 1)
if self.maze[new_y][self.player_pos[1]] != 1:
self.player_pos = (new_y, self.player_pos[1])
self.update_player()
self.check_win()

def move_left(self, event):

if not self.game_over:
new_x = max(0, self.player_pos[1] - 1)
if self.maze[self.player_pos[0]][new_x] != 1:
self.player_pos = (self.player_pos[0], new_x)
self.update_player()
self.check_win()

def move_right(self, event):

if not self.game_over:
new_x = min(len(self.maze[0]) - 1, self.player_pos[1] + 1)
if self.maze[self.player_pos[0]][new_x] != 1:
self.player_pos = (self.player_pos[0], new_x)
self.update_player()
self.check_win()

def update_player(self):
self.draw_player()

def check_win(self):
if self.player_pos == ending_position:
self.game_over = True
messagebox.showinfo("Congratulations!", "You won!")
self.master.quit()

def create_custom_maze():
rows = simpledialog.askinteger("Input", "Enter the number of rows:")
cols = simpledialog.askinteger("Input", "Enter the number of columns:")
maze = [[0] * cols for _ in range(rows)]
for i in range(rows):
for j in range(cols):
maze[i][j] = simpledialog.askinteger("Input", f"Enter value for cell ({i}, {j}): 0 (empty) or 1
(blocked)")
return maze

# Get custom maze input from the user

maze = create_custom_maze()

# Define maze parameters

maze_width = 20 * len(maze[0])
maze_height = 20 * len(maze)
Output:
cell_size = 20

# Specify starting position and ending position

starting_position = (0, 0) # Top-left corner
ending_position = (len(maze) - 1, len(maze[0]) - 1) # Bottom-right corner

# Mark starting and ending positions in the maze

maze[starting_position[0]][starting_position[1]] = 2 # Represented by 2
maze[ending_position[0]][ending_position[1]] = 3 # Represented by 3

# Create the maze game

root = tk.Tk()
game = MazeGame(root, maze)
root.mainloop()

Result:
Thus q-learning of reinforcement learning has been implemented and executed successfully.
Experiment no:9 MULTI LAYER PERCEPTRON
Date:

Aim:
The Objective of this experiment is to implement Multilayer perceptron using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Define the `MLP` class with methods for forward pass, backward pass, training, evaluation,
and auxiliary functions for sigmoid activation and its derivative
Step 4: Get user inputs for the number of patterns and input features per pattern
Step 5: Generate XOR dataset based on user inputs
Step 6: Define an MLP object with specified input, hidden, and output sizes
Step 7: Train the MLP using the generated XOR dataset
Step 8: Evaluate the trained MLP using the same dataset
Step 9: Print evaluation metrics such as accuracy, precision, recall, F1 score, and confusion matrix
Step 10: Plot the training loss over epochs and the confusion matrix
Step 11: Stop
Program:
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score,
f1_score
import matplotlib.pyplot as plt

class MLP:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size

# Initialize weights and biases

self.weights_input_hidden = np.random.randn(input_size, hidden_size)
self.bias_input_hidden = np.zeros((1, hidden_size))

self.weights_hidden_output = np.random.randn(hidden_size, output_size)

self.bias_hidden_output = np.zeros((1, output_size))

# Store loss values during training

self.loss_history = []

def forward(self, inputs):

# Forward pass
self.hidden_layer_activation = np.dot(inputs, self.weights_input_hidden) +
self.bias_input_hidden
self.hidden_layer_output = self.sigmoid(self.hidden_layer_activation)
self.output_layer_activation = np.dot(self.hidden_layer_output, self.weights_hidden_output) +
self.bias_hidden_output
self.output = self.sigmoid(self.output_layer_activation)

return self.output

def sigmoid(self, x):

return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):

return x * (1 - x)

def backward(self, inputs, targets, learning_rate):

# Backward pass
# Calculate output layer error
output_error = targets - self.output
output_delta = output_error * self.sigmoid_derivative(self.output)

# Calculate hidden layer error

hidden_error = output_delta.dot(self.weights_hidden_output.T)
hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_layer_output)

# Update weights and biases

self.weights_hidden_output += self.hidden_layer_output.T.dot(output_delta) * learning_rate
self.bias_hidden_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate

self.weights_input_hidden += inputs.T.dot(hidden_delta) * learning_rate

self.bias_input_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

def train(self, inputs, targets, learning_rate, epochs, validation_data=None, patience=100):

best_loss = np.inf
no_improvement_count = 0

for epoch in range(epochs):

# Forward pass
outputs = self.forward(inputs)

# Backward pass
self.backward(inputs, targets, learning_rate)

# Calculate and store loss

loss = np.mean(np.square(targets - outputs))
self.loss_history.append(loss)

if epoch % 1000 == 0:
print(f"Epoch {epoch}, Loss: {loss}")

# Early stopping based on validation loss

if validation_data:
val_inputs, val_targets = validation_data
val_outputs = self.forward(val_inputs)
val_loss = np.mean(np.square(val_targets - val_outputs))

if val_loss < best_loss:

best_loss = val_loss
no_improvement_count = 0
else:
no_improvement_count += 1

if no_improvement_count >= patience:

print(f"Stopping early at epoch {epoch} due to no improvement in validation loss.")
break

def evaluate(self, inputs, targets):

predictions = np.round(self.forward(inputs))
accuracy = accuracy_score(targets, predictions)
precision = precision_score(targets, predictions)
recall = recall_score(targets, predictions)
f1 = f1_score(targets, predictions)
confusion_mat = confusion_matrix(targets, predictions)
return accuracy, precision, recall, f1, confusion_mat

# User Inputs
num_patterns = int(input("Enter the number of patterns in your dataset: "))
input_size = int(input("Enter the number of input features per pattern: "))

# Generate XOR dataset automatically

X_train = np.random.randint(2, size=(num_patterns, input_size))
y_train = np.array([[(x[0] ^ x[1])] for x in X_train]) # Calculate XOR output automatically

# Use the same data for training and testing

X_test, y_test = X_train, y_train

# Define MLP
mlp = MLP(input_size=input_size, hidden_size=4, output_size=1)

# Train MLP
mlp.train(X_train, y_train, learning_rate=0.01, epochs=10000, validation_data=(X_test, y_test),
patience=500)

# Evaluate MLP
accuracy, precision, recall, f1, confusion_mat = mlp.evaluate(X_test, y_test)

# Print evaluation metrics

print("\nEvaluation Metrics:")
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
Output:
Enter the number of patterns in your dataset: 100
Enter the number of input features per pattern: 2
Epoch 0, Loss: 0.36609605648209426
Epoch 1000, Loss: 0.015983581207552385
Epoch 2000, Loss: 0.004408591411027319
Epoch 3000, Loss: 0.002388007046487561
Epoch 4000, Loss: 0.0016026034795046582
Epoch 5000, Loss: 0.0011936939190098374
Epoch 6000, Loss: 0.000945475101628567
Epoch 7000, Loss: 0.0007797834456328132
Epoch 8000, Loss: 0.0006617921746716564
Epoch 9000, Loss: 0.0005737365743443018

Evaluation Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0

Confusion Matrix:
[[50 0]
[ 0 50]]

print("\nConfusion Matrix:")
print(confusion_mat)

# Plot the training loss over epochs

plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(mlp.loss_history)
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')

# Plot the confusion matrix

plt.subplot(1, 2, 2)
plt.imshow(confusion_mat, cmap='Blues', interpolation='nearest')
plt.title('Confusion Matrix')
plt.colorbar()
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.xticks([0, 1], ['0', '1'])
plt.yticks([0, 1], ['0', '1'])
for i in range(confusion_mat.shape[0]):
for j in range(confusion_mat.shape[1]):
plt.text(j, i, confusion_mat[i, j], ha='center', va='center', color='red')

plt.tight_layout()
plt.show()

Result:
Thus the multi layer perceptron has been implemented and executed successfully.
FEATURE EXTRACTION:

import re

from urllib.parse import urlparse, parse_qs, unquote

import pandas as pd

def extract_url_length(url):

return len(url)

def has_https(url):

return url.startswith("https")

def has_suspicious_keywords(url):

suspicious_keywords = ['phishing', 'malware', 'fraud', 'login', 'password', 'bank', 'paypal']

for keyword in suspicious_keywords:

if keyword in url:

return 1

return 0

def extract_url_syntax(url):

return len(url)

def extract_host_name(url):

parsed_url = urlparse(url)

return parsed_url.hostname

def extract_path_length(url):

parsed_url = urlparse(url)

return len(parsed_url.path)

def has_favicon(url):

return 0

def has_ssl_certificate(url):

return 1

def extract_page_rank(url):

return 0

def has_double_slash_redirection(url):

return "//" in url

def extract_folder_name(url):
parsed_url = urlparse(url)

return parsed_url.path.split('/')[-1]

def has_special_characters(url):

special_characters = set("!@#$%^&*()+=-[]{}|\<>?,.")

return any(char in special_characters for char in url)

def extract_ccTLD(url):

parsed_url = urlparse(url)

return parsed_url.netloc.split('.')[-1]

def has_subdomains(url):

parsed_url = urlparse(url)

return len(parsed_url.netloc.split('.')) > 2

def extract_features(url):

features = {

'url_length': extract_url_length(url),

'path_length': extract_path_length(url),

'ip_present': 0,

'number_of_digits': sum(c.isdigit() for c in url),

'number_of_parameters': 0,

'number_of_fragments': 0,

'is_url_encoded': 0,

'number_encoded_char': 0,

'host_length': 0,

'number_of_periods': 0,

'has_client': 0,

'has_admin': 0,

'has_server': 0,

'has_login': 0,

'number_of_dashes': 0,

'have_at_sign': 0,

'is_tiny_url': 0,

'number_of_spaces': 0,
'number_of_zeros': 0,

'number_of_uppercase': 0,

'number_of_lowercase': 0,

'number_of_double_slashes': 0,

'HTTPS Usage': has_https(url),

'Presence of Suspicious Keywords': has_suspicious_keywords(url),

'URL Syntax': extract_url_syntax(url),

'Presence of Favicon': has_favicon(url),

'SSL Certificate': has_ssl_certificate(url),

'Page Rank': extract_page_rank(url),

'Double Slash Redirection': has_double_slash_redirection(url),

'Folder Name': extract_folder_name(url),

'Use of Special Characters': has_special_characters(url),

'Country Code Top-Level Domain (ccTLD)': extract_ccTLD(url),

'Presence of Subdomains': has_subdomains(url)

parsed_url = urlparse(url)

if re.match(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', parsed_url.netloc):

features['ip_present'] = 1

if parsed_url.fragment:

features['number_of_fragments'] = len(parsed_url.fragment)

if url != unquote(url):

features['is_url_encoded'] = 1

features['number_encoded_char'] = len(re.findall('%[0-9A-Fa-f]{2}', url))

features['host_length'] = len(parsed_url.netloc)

features['number_of_periods'] = parsed_url.netloc.count('.')

keywords = ['client', 'admin', 'server', 'login']

for keyword in keywords:

features['has_' + keyword] = 1 if keyword in url else 0

features['number_of_dashes'] = url.count('-')
features['have_at_sign'] = 1 if '@' in url else 0

features['is_tiny_url'] = 1 if len(url) < 25 else 0

features['number_of_spaces'] = url.count(' ')

features['number_of_zeros'] = url.count('0')

features['number_of_uppercase'] = sum(c.isupper() for c in url)

features['number_of_lowercase'] = sum(c.islower() for c in url)

features['number_of_double_slashes'] = url.count('//')

return features

def extract_features_for_dataset(dataset):

extracted_features = []

for url in dataset['URL']:

features = extract_features(url)

extracted_features.append(features)

return pd.DataFrame(extracted_features)

data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/datasets/dataset_with_url.csv')

extracted_features_df = extract_features_for_dataset(data)

data_with_features = pd.concat([data, extracted_features_df], axis=1)

data_with_features.to_csv('/content/drive/MyDrive/Colab
Notebooks/datasets/feature_extracted_dataset.csv', index=False)df =
pd.read_csv("/content/drive/MyDrive/ColabNotebooks/datasets/feature_extracted_dataset.csv")pri
nt("extracted features dataset:\n")

print(df.head())
output:
extracted features dataset:
URL status url_length \
0 http://www.crestonwood.com/router.php legitimate
37
1 http://shadetreetechnology.com/V4/validation/a... phishing
77
2 https://support-appleld.com.secureupdate.duila... phishing
126
3 http://rgipt.ac.in legitimate
18
4 http://www.iracing.com/tracks/gateway-motorspo... legitimate
55

path_length ip_present number_of_digits number_of_parameters \

0 11 0 0 0
1 47 0 17 0
2 20 0 19 0
3 0 0 0 0
4 33 0 0 0

number_of_fragments is_url_encoded number_encoded_char ... \

0 0 0 0 ...
1 0 0 0 ...
2 0 0 0 ...
3 0 0 0 ...
4 0 0 0 ...

Presence of Suspicious Keywords URL Syntax Presence of Favicon \

0 0 37 0
1 0 77 0
2 0 126 0
3 0 18 0
4 0 55 0

SSL Certificate Page Rank Double Slash Redirection \

0 1 0 True
1 1 0 True
2 1 0 True
3 1 0 True
4 1 0 True

Folder Name Use of Special Characters \

0 router.php True
1 a111aedc8ae390eabcfa130e041a10a4 True
2 NaN True
3 NaN True
4 NaN True

Country Code Top-Level Domain (ccTLD) Presence of Subdomains

0 com True
1 com False
2 com True
3 in True
4 com True

[5 rows x 35 columns]
OPTIMIZATION USING ANTCOLONY ALGORITHM:

import numpy as np
import pandas as pd

# Load the dataset

data = pd.read_csv('malicious_url.csv')

# Extract URLs and labels

urls = data['url'].values
labels = data['type'].values

# Define parameters
num_ants = 10
num_iterations = 100
pheromone_decay = 0.5
pheromone_increase = 1.0

# Initialize pheromone levels

num_urls = len(urls)
pheromones = np.ones(num_urls)

# Define objective function to evaluate solutions

def objective_function(solution):
# Dummy function, replace with actual scoring logic
return np.sum(solution)

# Ant Colony Optimization

best_solution = None
best_score = float('-inf')

for iteration in range(num_iterations):

for ant in range(num_ants):
# Construct solution probabilistically based on pheromone levels
probabilities = pheromones / np.sum(pheromones) if np.sum(pheromones) != 0 else
np.ones(num_urls) / num_urls
solution = np.random.choice([0, 1], size=num_urls, p=probabilities)

# Evaluate solution
score = objective_function(solution)

# Update best solution

if score > best_score:
best_solution = solution
best_score = score
# Update pheromone levels
pheromones *= pheromone_decay
pheromones[best_solution == 1] += pheromone_increase

# Process the best solution to identify malicious URLs

malicious_urls_indices = np.where(best_solution == 1)[0]
malicious_urls = urls[malicious_urls_indices]

# Save the filtered dataset containing malicious URLs

malicious_data = data.iloc[malicious_urls_indices]
malicious_data.to_csv('malicious_urls_output.csv', index=False)

SAMPLE DATASET:

Practical 4
No ratings yet
Practical 4
3 pages
Fitness 3 18
No ratings yet
Fitness 3 18
16 pages
PWP - Microproject 06 PDF
No ratings yet
PWP - Microproject 06 PDF
16 pages
Assignment 1 (Set B)
No ratings yet
Assignment 1 (Set B)
3 pages
BME303 Lab5 NinaSawaf
No ratings yet
BME303 Lab5 NinaSawaf
11 pages
ML Lab Manual-Iso
No ratings yet
ML Lab Manual-Iso
40 pages
Day 3 Python Session - Colaboratory
No ratings yet
Day 3 Python Session - Colaboratory
2 pages
Mini Project
No ratings yet
Mini Project
21 pages
Assignment 4 - Jupyter Notebook
No ratings yet
Assignment 4 - Jupyter Notebook
6 pages
Cardiovascular Disease Prediction
No ratings yet
Cardiovascular Disease Prediction
2 pages
ZHSS 2019 P2
No ratings yet
ZHSS 2019 P2
7 pages
Task 2
No ratings yet
Task 2
4 pages
Part I
No ratings yet
Part I
10 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Class 14 - Basic Coding in Python - 5
No ratings yet
Class 14 - Basic Coding in Python - 5
24 pages
April26 Py
No ratings yet
April26 Py
3 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
ML Lab 6-9
No ratings yet
ML Lab 6-9
15 pages
Pythonchpscenario
No ratings yet
Pythonchpscenario
9 pages
Functions and Return Values2
No ratings yet
Functions and Return Values2
9 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Step 1
No ratings yet
Step 1
10 pages
Rufh 2
No ratings yet
Rufh 2
28 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Project 190
No ratings yet
Project 190
6 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Computer Science IA
No ratings yet
Computer Science IA
13 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
ICS3U Function Assignment2a
No ratings yet
ICS3U Function Assignment2a
2 pages
CODING
No ratings yet
CODING
7 pages
ENGG1330 Chapter4 FunctionA
No ratings yet
ENGG1330 Chapter4 FunctionA
85 pages
Project 16 Calories Burnt Prediction
No ratings yet
Project 16 Calories Burnt Prediction
10 pages
PPS Mini Project
No ratings yet
PPS Mini Project
5 pages
Project
No ratings yet
Project
23 pages
Intro To Python For Data Science: Numpy
No ratings yet
Intro To Python For Data Science: Numpy
10 pages
Python Class 6 Assignment Solution
No ratings yet
Python Class 6 Assignment Solution
9 pages
ENGG1003 Lab09 PythonDataProcessing
No ratings yet
ENGG1003 Lab09 PythonDataProcessing
24 pages
Brain Weight Prediction - Colab2
No ratings yet
Brain Weight Prediction - Colab2
15 pages
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
No ratings yet
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
22 pages
Python Sklearn Linear Regression
No ratings yet
Python Sklearn Linear Regression
45 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Practical 4
No ratings yet
Practical 4
2 pages
Untitled Document
No ratings yet
Untitled Document
21 pages
BMI Data Analysis.
No ratings yet
BMI Data Analysis.
5 pages
CSV Python Annotated
No ratings yet
CSV Python Annotated
27 pages
CSP
No ratings yet
CSP
4 pages
Python
No ratings yet
Python
4 pages
Batch 1 Set Question
No ratings yet
Batch 1 Set Question
3 pages
Turing Data Analysis
No ratings yet
Turing Data Analysis
30 pages
Data Mining Lab: Regression & Clustering
No ratings yet
Data Mining Lab: Regression & Clustering
36 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Computer Science Practice Assignement (3) Solution
No ratings yet
Computer Science Practice Assignement (3) Solution
9 pages
Revision Lab 05
No ratings yet
Revision Lab 05
6 pages
Info
No ratings yet
Info
31 pages
IP Project File
No ratings yet
IP Project File
14 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
Ut 1
No ratings yet
Ut 1
31 pages
Unit - 3 User Interface Design Model
No ratings yet
Unit - 3 User Interface Design Model
11 pages
TSA@1
No ratings yet
TSA@1
9 pages
Stress Detection via Machine Learning
No ratings yet
Stress Detection via Machine Learning
9 pages
Module3 DSV Notes
No ratings yet
Module3 DSV Notes
29 pages
Unit3 ID3 DT Examples
No ratings yet
Unit3 ID3 DT Examples
12 pages
Book Summary
No ratings yet
Book Summary
35 pages
Disease Detection Using ML
100% (8)
Disease Detection Using ML
24 pages
Chapter4 Classification Prediction
No ratings yet
Chapter4 Classification Prediction
173 pages
Groundwater Level Prediction of Landslide Based On Classification and Regression Tree
No ratings yet
Groundwater Level Prediction of Landslide Based On Classification and Regression Tree
8 pages
Data Mining for Tech Students
No ratings yet
Data Mining for Tech Students
36 pages
Decision Tree Basics for Students
No ratings yet
Decision Tree Basics for Students
12 pages
Lavanuru Lakshmi Keerthi-Internship Report - Lavanuru Lakshmi Keerthi
No ratings yet
Lavanuru Lakshmi Keerthi-Internship Report - Lavanuru Lakshmi Keerthi
43 pages
Classification
No ratings yet
Classification
10 pages
Prospects and Challenges of Using Artificial Intelligence in The Audit Process
No ratings yet
Prospects and Challenges of Using Artificial Intelligence in The Audit Process
28 pages
Decision Trees for Data Scientists
100% (1)
Decision Trees for Data Scientists
12 pages
Ciml v0 - 99 ch02 PDF
No ratings yet
Ciml v0 - 99 ch02 PDF
10 pages
Decision Tree
No ratings yet
Decision Tree
51 pages
Applied Machine Learning Course Schedule: 1:fundamentals of Programming
No ratings yet
Applied Machine Learning Course Schedule: 1:fundamentals of Programming
33 pages
Lab Master Record On
No ratings yet
Lab Master Record On
42 pages
Performance Analysis of Machine Learning Classifier For Predicting Chronic Kidney Disease
No ratings yet
Performance Analysis of Machine Learning Classifier For Predicting Chronic Kidney Disease
83 pages
09 - ML - Decision Tree
No ratings yet
09 - ML - Decision Tree
45 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Chapter 2 Basic Case-Based Reasoning Techniques
No ratings yet
Chapter 2 Basic Case-Based Reasoning Techniques
6 pages
AI&ML Exam Prep for CS Students
No ratings yet
AI&ML Exam Prep for CS Students
8 pages
ML and Ai Synopsis
No ratings yet
ML and Ai Synopsis
8 pages
E-Commerce Churn Prediction
100% (1)
E-Commerce Churn Prediction
24 pages
Crimes in India
No ratings yet
Crimes in India
88 pages
Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
Stroke Prediction Using Machine Learning
No ratings yet
Stroke Prediction Using Machine Learning
8 pages
Decision Trees
No ratings yet
Decision Trees
150 pages
2 Marks
No ratings yet
2 Marks
5 pages
UNIT - II - Data Mining Essentials
No ratings yet
UNIT - II - Data Mining Essentials
20 pages