ML
ML
Aim:
The Objective of the experiment is to create a categorical dataset using numpy, pandas
package in python.
Algorithm:
Step 1: Start
Step 4: Open the CSV file in write mode and define field names
Step 9: If not, prompt the user to enter their age, height, and weight
Step 12: Write the user's data (name, age, height, weight, BMI, BMI category) to the CSV file
Step 13: Repeat the loop until the user chooses to exit
Step 14: Display a message indicating that the data has been written to the CSV file
Program:
import csv
csv_file_path = 'data_with_bm.csv'
# Write header
writer.writeheader()
Output:
if name.lower() == 'exit':
break
# Calculate BMI
bmi = weight / (height ** 2)
# Categorize BMI
if bmi < 18.5:
bmi_category = 'Underweight'
elif 18.5 <= bmi < 24.9:
bmi_category = 'Normal weight'
elif 24.9 <= bmi < 29.9:
bmi_category = 'Overweight'
else:
bmi_category = 'Obese'
Result:
Thus the categorical dataset using numpy, pandas package in python has been created successfully.
Experiment DATASET CREATION
no:1-b CONTINUOUS DATASET
Date:
Aim:
The Objective of the experiment is to create a categorical dataset using numpy, pandas package
in python.
Algorithm:
Step 1: Start
Step 4: Open the CSV file in write mode and define field names (Name, Age, Height, Weight, BMI)
Step 9: If not, prompt the user to enter their age, height, and weight
Step 11: Write the user's data (name, age, height, weight, BMI) to the CSV file
Step 12: Repeat the loop until the user chooses to exit
Step 13: Display a message indicating that the data has been written to the CSV file
PROGRAM:
import csv
csv_file_path = 'bmi.csv'
while True:
# Get user input for each entry
name = input("Enter your name (or type 'exit' to finish): ")
if name.lower() == 'exit':
break
# Calculate BMI
bmi = weight / (height ** 2)
RESULT:
Thus the python program to create dataset is executed and verified successfully.
Experiment no:2-a SIMPLE LINEAR REGRESSION USING
Date: LIBRARY FUNCTION
Aim:
The Objective of this experiment is to implement simple linear regression using library functions.
Algorithm:
Step 1: Start
Step 3: Load dataset and split into training and testing sets
Step 10: Plot regression line with data points and threshold
Program:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, precision_recall_fscore_support,
accuracy_score
import matplotlib.pyplot as plt
OUTPUT:
Precision: 1.0
Recall: 0.6666666666666666
F1 Score: 0.8
Accuracy: 0.8333333333333334
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Accuracy:", accuracy)
Result:
Thus the simple linear Regression with built in functionhas been implemented and executed
successfully.
Experiment no:2-b SIMPLE LINEAR REGRESSION WITHOUT
Date: USING LIBRARY FUNCTION
Aim:
The Objective of this experiment is to implement simple linear regression without using library
functions.
Algorithm:
Step 1: Start
Step 3: Load the dataset from the CSV file specified in 'file_path'
Step 4: Read the CSV file without using pandas, extracting lines of data
Step 10: Evaluate the model's performance using Mean Squared Error (MSE) and R-squared (R^2)
Score
Step 11: Prompt the user to enter years of experience for predicting salary
Step 12: Predict salary using the trained linear regression model
Step 14: Plot the regression line along with test data points
Program:
import matplotlib.pyplot as plt
Output:
Mean Squared Error: 35766738.2396581
R^2 Score: 0.8450481599189925
Enter year of experience: 4
Predicted salary: 63870.993342864815
Result:
Thus the simple linear Regression without built in functionhas been implemented and executed
successfully.
Experiment no:2-c MULTI LINEAR REGRESSION USING
Date: LIBRARY FUNCTION
Aim:
The Objective of this experiment is to implement Multi linear regression using library
functions.
Algorithm:
Step 1: Start
Program:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Assuming 'TV', 'Radio', and 'Newspaper' are features and 'Sales' is the target variable
X = data[['TV', 'Radio', 'Newspaper']].values
y = data['Sales'].values
SAMPLE DATASET:
Output:
Mean Squared Error: 2.9077569102710923
R^2 Score: 0.9059011844150826
Enter TV advertising budget: 6677
Enter Radio advertising budget: 6488
Enter Newspaper advertising budget: 3566
Predicted Sales: 1039.0705215552014
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Get input from the user for Radio, TV, and Newspaper advertising budgets
tv_budget = float(input("Enter TV advertising budget: "))
radio_budget = float(input("Enter Radio advertising budget: "))
newspaper_budget = float(input("Enter Newspaper advertising budget: "))
Result:
Thus the multiple linear Regression with built in function has been implemented and executed
successfully.
Experiment no:2-d MULTI LINEAR REGRESSION WITHOUT
Date: USING LIBRARY FUNCTION
Aim:
The Objective of this experiment is to implement Multi linear regression without using library
functions.
Algorithm:
Step 1: Start
Step 3: Load the dataset from the CSV file specified in 'file_path'
Step 4: Read the CSV file without using pandas, extracting lines of data
Step 6: Define features ('TV', 'Radio', 'Newspaper') and target variable ('Sales')
Step 10: Make predictions on the test set with polynomial features
Step 11: Evaluate the model's performance using Mean Squared Error (MSE) and R-squared (R^2)
Score
Step 12: Prompt the user to enter advertising budgets for TV, Radio, and Newspaper
Program:
import matplotlib.pyplot as plt
# Assuming 'TV', 'Radio', and 'Newspaper' are features and 'Sales' is the target variable
X = [[row[0], row[1], row[2]] for row in data] # Features: TV, Radio, Newspaper
y = [row[3] for row in data] # Target variable: Sales
Output:
Mean Squared Error (Polynomial Regression): 59.046372645941645
R^2 Score (Polynomial Regression): 0.9454034508470807
Enter TV advertising budget: 5466
Enter Radio advertising budget: 3123
Enter Newspaper advertising budget:3424
Predicted Sales: 34212.556222982384
# Get input from the user for Radio, TV, and Newspaper advertising budgets
tv_budget = float(input("Enter TV advertising budget: "))
radio_budget = float(input("Enter Radio advertising budget: "))
newspaper_budget = float(input("Enter Newspaper advertising budget:"))
Result:
Thus multiple linear regression without using inbuilt function has been implemented and executed
successfully.
Experiment no:2-e IMPLEMENTATION OF LOGISTIC
Date: REGRESSION
Aim:
The Objective of this experiment is to implement logistic regression using library functions.
Algorithm:
Step 1: Start
Step 3: Load the dataset from the CSV file 'optimal_features_dataset.csv' using pandas
Step 6: Drop the original 'status' column and separate features (X) and target variable (y)
Step 11: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix
Step 12: Plot the distribution of the ratio of digits in the URL for legitimate and phishing URLs
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
# Load dataset
df = pd.read_csv('optimal_features_dataset.csv')
df.drop('url', axis=1, inplace=True)
# Encoding categorical variable 'status'
label_encoder = LabelEncoder()
df['status_encoded'] = label_encoder.fit_transform(df['status'])
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)
df.head()
plt.figure(figsize=(8, 6))
legitimate_data = df[df['status_encoded'] == 0]
phishing_data = df[df['status_encoded'] == 1]
plt.hist(legitimate_data['ratio_digits_url'], bins=20, alpha=0.5, label='Legitimate URLs')
plt.hist(phishing_data['ratio_digits_url'], bins=20, alpha=0.5, label='Phishing URLs')
plt.xlabel('Ratio of Digits in URL')
plt.ylabel('Frequency')
plt.title('Distribution of Ratio of Digits in URL')
plt.legend()
plt.show()
SAMPLE DATASET:
Output:
Accuracy: 0.6399825021872266
Precision: 0.7101648351648352
Recall: 0.4579273693534101
F1 Score: 0.5568120624663435
Confusion Matrix:
[[946 211]
[612 517]]
Result:
Thus the experiment is to implement logistic regression using library functions has been
implemented and executed successfully.
Experiment no:3-a NAIVE BAYES ALGORITHM WITH
Date: LIBRARY FUNCION
Aim:
The Objective of this experiment is to implement Naïve Bayes algorithm with library functions.
Algorithm:
Step 1: Start
Step 6: Use LabelEncoder for all features, including the boolean one, and encode categorical
variables
Step 9: Prompt the user to input values for Outlook, Temperature, Humidity, and Windy
Step 10: Transform user input to numerical form using the trained LabelEncoders
Step 11: Predict the outcome based on user input using the trained classifier
Program:
import pandas as pd
import numpy as np
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report
Output:
Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): Sunny
Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): Cool
Enter Humidity (e.g., 'High', 'Normal'): High
Enter Windy (True/False): False
Predicted Outcome: No
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
Result:
Thus the Naive Bayes algorithm with library functions has been executed and implemented
successfully.
Experiment no:3-b NAIVE BAYES ALGORITHM WITHOUT
Date: USING LIBRARY FUNCION
Aim:
The Objective of this experiment is to implement Naïve Bayes algorithm without using library
functions.
Algorithm:
1. Start
3. Read the dataset from the CSV file 'naive2.csv' using pandas.
4. Calculate the count of each class in the target variable 'Play Golf'.
5. Calculate the conditional probabilities for each attribute given the target variable.
6. Define a function `predict(X)` to predict the target variable based on the input attributes.
7. Prompt the user to input values for Outlook, Temperature, Humidity, and Windy.
8. Predict the outcome based on the user input using the `predict` function.
11. Stop.
Program:
import pandas as pd
import numpy as np
df = pd.read_csv('naive2.csv')
#df = df.drop(df.columns[df.columns.str.contains('Unnamed')], axis=1)
op = dict(df['Play Golf'].value_counts())
rows = df.shape[0]
prob = {}
for column in df.columns[:-1]:
column_d = {}
for class_ in df[column].unique():
t_d = {}
for output in df['Play Golf'].unique():
occurrences = ((df[column] == class_) & (df['Play Golf'] == output)).sum()
t_d[output] = occurrences/op[output]
prob[str(class_)] = t_d
SAMPLE DATASET:
Output:
Enter Outlook (e.g., 'Sunny', 'Overcast', 'Rainy'): Sunny
Enter Temperature (e.g., 'Hot', 'Mild', 'Cool'): Mild
Enter Humidity (e.g., 'High', 'Normal'): High
Enter Windy (True/False): True
No
5
13
None
dataset:
Outlook Temperature Humidity Windy Play Golf
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes
5 Rainy Cool Normal True No
6 Overcast Cool Normal True Yes
7 Sunny Mild High False No
8 Sunny Cool Normal False Yes
9 Rainy Mild Normal False Yes
10 Sunny Mild Normal True Yes
11 Overcast Mild High True Yes
12 Overcast Hot Normal False Yes
13 Rainy Mild High True No
prob_df = pd.DataFrame(prob)
def predict(X):
p_yes = op['Yes']/rows
p_no = op['No']/rows
for attr in X:
p_yes *= prob_df[attr]['Yes']
p_no *= prob_df[attr]['No']
P_yes = round(p_yes/(p_yes+p_no),2)
P_no = round(p_no/(p_yes+p_no),2)
def accuracy():
count = 0
for index,row in df.iterrows():
r = list(map(str,row))
if predict(r[:-1]) == r[-1]:
count += 1
else:
print(index)
print(count)
print(accuracy())
print("dataset: ")
print(df)
Result:
Thus the Naive Bayes algorithm without library functions has been executed and implemented
successfully.
Experiment no:4-a IMPLEMENTATION OF DECISION TREE
Date: USING ID3 ALGORITHM
Aim:
The Objective of this experiment is to implement decision tree using ID3 algorithm with inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 9: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix
Program:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')
Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Calculate precision
precision = precision_score(y_test, y_pred, average='weighted')
# Calculate recall
recall = recall_score(y_test, y_pred, average='weighted')
# Calculate F1-score
f1 = f1_score(y_test, y_pred, average='weighted')
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)
Result:
Thus the decision tree using ID3 algorithm with inbuilt libraries in python has been implemented and
executed successfully.
Experiment no:4-b IMPLEMENTATION OF DECISION TREE
Date: USING C4.5 ALGORITHM
Aim:
The Objective of this experiment is to implement decision tree using C4.5 algorithm with inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 5: Define a function to find the best split based on information gain
Step 8: Handle missing values by replacing NaN with a placeholder value ('missing')
Step 10: Split the dataset into train and test sets
Step 11: Initialize a DecisionTreeClassifier with criterion='entropy' for the C4.5-like algorithm
Step 14: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
import numpy as np
# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')
Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X = pd.get_dummies(X)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)
Result:
Thus the decision tree using C4.5 algorithm with inbuilt libraries in python has been implemented
and executed successfully.
Experiment no:4-c IMPLEMENTATION OF DECISION TREE
Date: USING CART ALGORITHM
Aim:
The Objective of this experiment is to implement decision tree using CART algorithm with inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 9: Evaluate the model's performance using accuracy, precision, recall, F1 score, and confusion
matrix
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')
Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)
Result:
Thus the decision tree using cart algorithm with inbuilt libraries in python has been implemented
and executed successfully.
Experiment no:5 IMPLEMENTATION OF K-MEANS
Date: CLUSTERING
Aim:
The Objective of this experiment is to implement K means clustering algorithm with
inbuilt libraries in python.
Algorithm:
Here's a concise breakdown of the provided code:
Step 1: Start
Step 2: Import necessary libraries: numpy, matplotlib.pyplot, datasets from sklearn, KMeans from
sklearn.cluster, silhouette_score, davies_bouldin_score, and calinski_harabasz_score from
sklearn.metrics
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score, calinski_harabasz_score
plt.legend()
plt.show()
Aim:
The Objective of this experiment is to implement the Support Vector Classification using inbuilt
libraries in python.
Algorithm:
Step 1: Import necessary libraries including pandas, train_test_split, SVC, accuracy_score,
precision_score, recall_score, f1_score, confusion_matrix from sklearn, and matplotlib.pyplot.
Step 2: Load the dataset from the specified CSV file.
Step 3: Encode the categorical variable 'status' using LabelEncoder.
Step 4: Drop the original 'status' column from the DataFrame.
Step 5: Separate features (X) and the target variable (y).
Step 6: Split the dataset into training and testing sets using train_test_split.
Step 7: Initialize an SVC classifier with a linear kernel and regularization parameter C=1.0.
Step 8: Train the SVC classifier using the training data.
Step 9: Predict the target variable for the test dataset.
Step 10: Evaluate the model using accuracy, precision, recall, F1-score, and confusion matrix.
Step 11: Plot the confusion matrix.
Step 12: Train an SVM model with a linear kernel and regularization parameter C=1.0 (repeated for
demonstration).
Step 13: Make predictions on the testing set (repeated for demonstration).
Step 14: Print the confusion matrix (repeated for demonstration).
Step 15: Calculate accuracy, precision, recall, and F1-score (repeated for demonstration).
Step 16: Define a function to plot the decision boundary for an SVM model.
Step 17: Plot the decision boundary for the first two features of the dataset.
Step 18: Stop.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import LabelEncoder
# Load dataset
df = pd.read_csv('/content/optimal_feature.csv')
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(conf_matrix)
# Step 6: Train an SVM model with a linear kernel and regularization parameter C=1.0
svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(X_train, y_train)
Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
Confusion Matrix:
[[2 0]
[0 1]]
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-score: 1.0
<ipython-input-16-b1a6a5941cd6>:98: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='blue', label='Non-malicious',
cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:99: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='red', label='Malicious',
cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:117: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[model.decision_function(X) > 0][:, 0],
X[model.decision_function(X) > 0][:, 1], c='green', label='Above Positive
Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
<ipython-input-16-b1a6a5941cd6>:118: UserWarning: No data for colormapping
provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(X[model.decision_function(X) < 0][:, 0],
X[model.decision_function(X) < 0][:, 1], c='orange', label='Below Negative
Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
# Plot data points above positive hyperplane and below negative hyperplane
plt.scatter(X[model.decision_function(X) > 0][:, 0], X[model.decision_function(X) > 0][:, 1],
c='green', label='Above Positive Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
plt.scatter(X[model.decision_function(X) < 0][:, 0], X[model.decision_function(X) < 0][:, 1],
c='orange', label='Below Negative Hyperplane', cmap=plt.cm.coolwarm, s=20, edgecolors='k')
Result:
Thus the Support Vector Classification using inbuilt libraries in python has been implemented and
executed successfully.
Experiment no:6-b SUPPORT VECTOR MACHINE
Date: SUPPORT VECTOR REGRESSION
Aim:
The Objective of this experiment is to implement the Support Vector Regression using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Load necessary libraries
Step 3: Load dataset
Step 4: Separate features and target variable
Step 5: Split dataset into train and test sets
Step 6: Create SVR model object
Step 7: Train SVR model
Step 8: Predict response for test dataset
Step 9: Evaluate model (calculate MSE and R^2)
Step 10: Plot actual vs. predicted values
Step 11: Stop
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Load dataset
df = pd.read_csv('dress_sales_prediction.csv')
Output:
Mean Squared Error: 8.043392151715722
R^2 Score: 0.9574175861521746
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
Result:
Thus the Support Vector Regression using inbuilt libraries in python has been implemented and
executed successfully
Experiment no:7-a ENSEMBLE LEARNING
Date: STACKING
Aim:
The Objective of this experiment is to implement stacking using inbuilt libraries in python.
Algorithm:
Step 1: Start
Step 2: Load necessary libraries
Step 3: Load dataset
Step 4: Separate features and target variable
Step 5: Split dataset into train and test sets
Step 6: Create SVR model object
Step 7: Train SVR model
Step 8: Predict response for test dataset
Step 9: Evaluate model (calculate MSE and R^2)
Step 10: Plot actual vs. predicted values
Step 11: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
Output:
Stacking accuracy: 0.6666666666666666
Precision: 0.0
Recall: 0.0
F1 Score: 0.0
/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning:
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to
control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Stacking accuracy: 0.3333333333333333
Precision: 0.3333333333333333
Recall: 1.0
F1 Score: 0.5
# Define meta-model
meta_model = LogisticRegression()
# Fit base models on the training fold and predict on the validation fold
for name, model in base_models:
cloned_model = clone(model)
cloned_model.fit(train_fold_X, train_fold_y)
meta_train[name] = cloned_model.predict_proba(val_fold_X)[:, 1]
# Fit base models on the entire training set and predict on the test set
for name, model in base_models:
cloned_model = clone(model)
cloned_model.fit(X_train, y_train)
meta_test[name] = cloned_model.predict_proba(X_test)[:, 1]
Result:
Thus stalking has been implemented and executed successfully
Experiment no:7-b ENSEMBLE LEARNING
Date: BAGGING-MAX VOTE
Aim:
The Objective of this experiment is to implement Max Vote bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Take majority vote for each instance
Step 14: Calculate accuracy, precision, recall, and F1 score
Step 15: Print bagging with majority voting metrics: accuracy, precision, recall, and F1 score
Step 16: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
Result:
Thus Bagging- max voting has been implemented and executed successfully
SAMPLE DATASET:
Output:
Bagging with Majority Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Experiment no:7-c ENSEMBLE LEARNING
Date: BAGGING-AVERAGE VOTTING
Aim:
The Objective of this experiment is to implement Average Voting bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Take probabilities of predictions for each base model
Step 14: Calculate average probabilities for each class across all base models
Step 15: Convert average probabilities to predicted labels by selecting the class with the highest
probability
Step 16: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 17: Print bagging with average voting metrics: accuracy, precision, recall, and F1 score
Step 18: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
import numpy as np
Aim:
The Objective of this experiment is to implement Average Voting bagging method using inbuilt
libraries in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into train and test sets
Step 8: Define base model as RandomForestClassifier with 100 estimators
Step 9: Create a list to hold base models
Step 10: Set the number of base models to create as 5
Step 11: Train and clone base models:
a. Iterate over the number of base models
b. Clone the base model
c. Fit the cloned model on the training set
d. Append the cloned model to the list of models
Step 12: Make predictions on the test set using each base model
Step 13: Assign weights to each base model's predictions (example weights are provided)
Step 14: Calculate the weighted average of predictions from all base models
Step 15: Convert weighted average probabilities to predicted labels by selecting the class with the
highest probability
Step 16: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 17: Print bagging with weighted average voting metrics: accuracy, precision, recall, and F1 score
Step 18: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
import numpy as np
SAMPLE DATASET:
Output:
Bagging with Weighted Average Voting Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
Result:
Thus Bagging- weighted average has been implemented and executed successfully
Experiment no:7-e ENSEMBLE LEARNING
Date: BOOSTING – ADAPTIVE BOOSTING
Aim:
The Objective of this experiment is to implement Adaptive boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into training and testing sets
Step 8: Initialize AdaBoost Classifier with Decision Tree as base estimator and 50 estimators
Step 9: Train the AdaBoost Classifier on the training set
Step 10: Make predictions on the testing set
Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 12: Print accuracy, precision, recall, and F1 score
Step 13: Print confusion matrix
Step 14: Stop
Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
# Initialize AdaBoost Classifier with Decision Tree as base estimator
adaboost_classifier = AdaBoostClassifier(n_estimators=50, random_state=42)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
Result:
Thus adaptive boosting has been implemented and executed successfully.
Experiment no:7-f ENSEMBLE LEARNING
Date: BOOSTING – GRADIENT BOOSTING
Aim:
The Objective of this experiment is to implement Gradient boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Load dataset
Step 4: Encode categorical variable 'status'
Step 5: Drop the original 'status' column
Step 6: Separate features and target variable
Step 7: Split the dataset into training and testing sets
Step 8: Initialize Gradient Boosting Classifier with 100 estimators
Step 9: Train the Gradient Boosting Classifier on the training set
Step 10: Make predictions on the testing set
Step 11: Calculate evaluation metrics (accuracy, precision, recall, F1 score)
Step 12: Print accuracy, precision, recall, and F1 score
Step 13: Print confusion matrix
Step 14: Stop
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[2 0]
[0 1]]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
Result:
Thus gradient boosting has been implemented and executed successfully.
Experiment no:7-g ENSEMBLE LEARNING
Date: BOOSTING – CATEGORIAL BOOSTING
Aim:
The Objective of this experiment is to implement Categorical boosting using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
import catboost as ctb
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data['status_encoded'] = label_encoder.fit_transform(data['status'])
SAMPLE DATASET:
Output:
Learning rate set to 0.001502
0: learn: 0.6922122 total: 1.08ms remaining: 1.08s
1: learn: 0.6908270 total: 6.1ms remaining: 3.04s
2: learn: 0.6898993 total: 7.48ms remaining: 2.48s
3: learn: 0.6888169 total: 9.22ms remaining: 2.29s
4: learn: 0.6869733 total: 12.2ms remaining: 2.43s
5: learn: 0.6856654 total: 14.6ms remaining: 2.42s
6: learn: 0.6844488 total: 16.2ms remaining: 2.29s
7: learn: 0.6834330 total: 16.9ms remaining: 2.09s
8: learn: 0.6817790 total: 19.4ms remaining: 2.14s
9: learn: 0.6802639 total: 19.9ms remaining: 1.97s
10: learn: 0.6789032 total: 23.8ms remaining: 2.14s
11: learn: 0.6772619 total: 24.6ms remaining: 2.02s
12: learn: 0.6762064 total: 25.8ms remaining: 1.96s
13: learn: 0.6751169 total: 27.3ms remaining: 1.92s
14: learn: 0.6741816 total: 28.9ms remaining: 1.9s
15: learn: 0.6732487 total: 31.2ms remaining: 1.92s
16: learn: 0.6719068 total: 32.5ms remaining: 1.88s
17: learn: 0.6702825 total: 33.6ms remaining: 1.83s
18: learn: 0.6689514 total: 34.6ms remaining: 1.79s
19: learn: 0.6677421 total: 42.3ms remaining: 2.07s
20: learn: 0.6665820 total: 44ms remaining: 2.05s
21: learn: 0.6657759 total: 45.4ms remaining: 2.02s
And so on…………………
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[1 0]
[0 3]]
# Drop the original 'status' column
data.drop('status', axis=1, inplace=True)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("\nConfusion Matrix:\n", confusion_matrix(Y_test, Y_pred))
Result:
Thus categorical boosting has been implemented and executed successfully.
Experiment no:8 REINFORCEMNT LEARNING
Date: Q LEARNING ALGORITHM(Maze Game)
Aim:
The Objective of this experiment is to implement Q-learning algorithm of reinforcement
learning in python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Define the `QLearningMaze` class with methods to initialize Q-learning parameters, choose
actions, update Q-values, train, take actions, and solve the maze
Step 4: Define the `MazeGame` class with methods to initialize the maze game, draw the maze, draw
the player, handle player movements, and check for game completion
Step 5: Create a custom maze function to get user input for maze dimensions and values
Step 6: Get custom maze input from the user
Step 7: Specify maze parameters such as width, height, and cell size
Step 8: Specify starting and ending positions in the maze
Step 9: Mark starting and ending positions in the maze
Step 10: Create the maze game window and run the game loop
Step 11: Stop
Program:
import numpy as np
import tkinter as tk
from tkinter import messagebox, simpledialog
class QLearningMaze:
def __init__(self, maze, start, end, learning_rate=0.1, discount_factor=0.9, exploration_rate=0.1,
epochs=1000):
self.maze = maze
self.start = start
self.end = end
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.exploration_rate = exploration_rate
self.epochs = epochs
self.q_values = np.zeros((len(maze), len(maze[0]), 4)) # Q-values for each action (up, down, left,
right)
def train(self):
for _ in range(self.epochs):
current_state = self.start
while current_state != self.end:
action = self.choose_action(current_state)
next_state, reward = self.take_action(current_state, action)
self.update_q_values(current_state, action, reward, next_state)
current_state = next_state
# If the agent is at the destination but can't move towards it, return to the start
if next_state == state and state == self.end:
next_state = self.start
reward = -100 # Penalize for being stuck at the destination
def solve(self):
current_state = self.start
path = [current_state]
while current_state != self.end:
action = np.argmax(self.q_values[current_state[0], current_state[1]])
next_state, _ = self.take_action(current_state, action)
path.append(next_state)
current_state = next_state
return path
class MazeGame:
def __init__(self, master, maze):
self.master = master
self.master.title("Maze Game")
self.canvas = tk.Canvas(master, width=maze_width, height=maze_height)
self.canvas.pack()
self.maze = maze
self.player_pos = (0, 0) # Starting position of the player
self.draw_maze()
self.player = None # To store player's shape ID
self.game_over = False
self.master.bind('<Up>', self.move_up)
self.master.bind('<Down>', self.move_down)
self.master.bind('<Left>', self.move_left)
self.master.bind('<Right>', self.move_right)
def draw_maze(self):
for y in range(len(self.maze)):
for x in range(len(self.maze[y])):
if self.maze[y][x] == 1:
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="black")
elif self.maze[y][x] == 0:
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="white")
elif self.maze[y][x] == 2: # Starting position
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="green")
elif self.maze[y][x] == 3: # Ending position
self.canvas.create_rectangle(x * cell_size, y * cell_size,
(x + 1) * cell_size, (y + 1) * cell_size,
fill="blue")
def draw_player(self):
if self.player:
self.canvas.delete(self.player)
player_x = self.player_pos[1] * cell_size
player_y = self.player_pos[0] * cell_size
self.player = self.canvas.create_oval(player_x, player_y, player_x + cell_size, player_y + cell_size,
fill="red")
def update_player(self):
self.draw_player()
def check_win(self):
if self.player_pos == ending_position:
self.game_over = True
messagebox.showinfo("Congratulations!", "You won!")
self.master.quit()
def create_custom_maze():
rows = simpledialog.askinteger("Input", "Enter the number of rows:")
cols = simpledialog.askinteger("Input", "Enter the number of columns:")
maze = [[0] * cols for _ in range(rows)]
for i in range(rows):
for j in range(cols):
maze[i][j] = simpledialog.askinteger("Input", f"Enter value for cell ({i}, {j}): 0 (empty) or 1
(blocked)")
return maze
Result:
Thus q-learning of reinforcement learning has been implemented and executed successfully.
Experiment no:9 MULTI LAYER PERCEPTRON
Date:
Aim:
The Objective of this experiment is to implement Multilayer perceptron using inbuilt libraries in
python.
Algorithm:
Step 1: Start
Step 2: Import necessary libraries
Step 3: Define the `MLP` class with methods for forward pass, backward pass, training, evaluation,
and auxiliary functions for sigmoid activation and its derivative
Step 4: Get user inputs for the number of patterns and input features per pattern
Step 5: Generate XOR dataset based on user inputs
Step 6: Define an MLP object with specified input, hidden, and output sizes
Step 7: Train the MLP using the generated XOR dataset
Step 8: Evaluate the trained MLP using the same dataset
Step 9: Print evaluation metrics such as accuracy, precision, recall, F1 score, and confusion matrix
Step 10: Plot the training loss over epochs and the confusion matrix
Step 11: Stop
Program:
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score,
f1_score
import matplotlib.pyplot as plt
class MLP:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
return self.output
# Backward pass
self.backward(inputs, targets, learning_rate)
if epoch % 1000 == 0:
print(f"Epoch {epoch}, Loss: {loss}")
# User Inputs
num_patterns = int(input("Enter the number of patterns in your dataset: "))
input_size = int(input("Enter the number of input features per pattern: "))
# Define MLP
mlp = MLP(input_size=input_size, hidden_size=4, output_size=1)
# Train MLP
mlp.train(X_train, y_train, learning_rate=0.01, epochs=10000, validation_data=(X_test, y_test),
patience=500)
# Evaluate MLP
accuracy, precision, recall, f1, confusion_mat = mlp.evaluate(X_test, y_test)
Evaluation Metrics:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[50 0]
[ 0 50]]
print("\nConfusion Matrix:")
print(confusion_mat)
plt.tight_layout()
plt.show()
Result:
Thus the multi layer perceptron has been implemented and executed successfully.
FEATURE EXTRACTION:
import re
import pandas as pd
def extract_url_length(url):
return len(url)
def has_https(url):
return url.startswith("https")
def has_suspicious_keywords(url):
if keyword in url:
return 1
return 0
def extract_url_syntax(url):
return len(url)
def extract_host_name(url):
parsed_url = urlparse(url)
return parsed_url.hostname
def extract_path_length(url):
parsed_url = urlparse(url)
return len(parsed_url.path)
def has_favicon(url):
return 0
def has_ssl_certificate(url):
return 1
def extract_page_rank(url):
return 0
def has_double_slash_redirection(url):
def extract_folder_name(url):
parsed_url = urlparse(url)
return parsed_url.path.split('/')[-1]
def has_special_characters(url):
special_characters = set("!@#$%^&*()+=-[]{}|\<>?,.")
def extract_ccTLD(url):
parsed_url = urlparse(url)
return parsed_url.netloc.split('.')[-1]
def has_subdomains(url):
parsed_url = urlparse(url)
def extract_features(url):
features = {
'url_length': extract_url_length(url),
'path_length': extract_path_length(url),
'ip_present': 0,
'number_of_parameters': 0,
'number_of_fragments': 0,
'is_url_encoded': 0,
'number_encoded_char': 0,
'host_length': 0,
'number_of_periods': 0,
'has_client': 0,
'has_admin': 0,
'has_server': 0,
'has_login': 0,
'number_of_dashes': 0,
'have_at_sign': 0,
'is_tiny_url': 0,
'number_of_spaces': 0,
'number_of_zeros': 0,
'number_of_uppercase': 0,
'number_of_lowercase': 0,
'number_of_double_slashes': 0,
parsed_url = urlparse(url)
if re.match(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', parsed_url.netloc):
features['ip_present'] = 1
if parsed_url.fragment:
features['number_of_fragments'] = len(parsed_url.fragment)
if url != unquote(url):
features['is_url_encoded'] = 1
features['host_length'] = len(parsed_url.netloc)
features['number_of_periods'] = parsed_url.netloc.count('.')
features['number_of_dashes'] = url.count('-')
features['have_at_sign'] = 1 if '@' in url else 0
features['number_of_zeros'] = url.count('0')
features['number_of_double_slashes'] = url.count('//')
return features
def extract_features_for_dataset(dataset):
extracted_features = []
features = extract_features(url)
extracted_features.append(features)
return pd.DataFrame(extracted_features)
extracted_features_df = extract_features_for_dataset(data)
data_with_features.to_csv('/content/drive/MyDrive/Colab
Notebooks/datasets/feature_extracted_dataset.csv', index=False)df =
pd.read_csv("/content/drive/MyDrive/ColabNotebooks/datasets/feature_extracted_dataset.csv")pri
nt("extracted features dataset:\n")
print(df.head())
output:
extracted features dataset:
URL status url_length \
0 http://www.crestonwood.com/router.php legitimate
37
1 http://shadetreetechnology.com/V4/validation/a... phishing
77
2 https://support-appleld.com.secureupdate.duila... phishing
126
3 http://rgipt.ac.in legitimate
18
4 http://www.iracing.com/tracks/gateway-motorspo... legitimate
55
[5 rows x 35 columns]
OPTIMIZATION USING ANTCOLONY ALGORITHM:
import numpy as np
import pandas as pd
# Define parameters
num_ants = 10
num_iterations = 100
pheromone_decay = 0.5
pheromone_increase = 1.0
# Evaluate solution
score = objective_function(solution)
SAMPLE DATASET: