0% found this document useful (0 votes)
3 views64 pages

Complete ML File Word File

ML file
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views64 pages

Complete ML File Word File

ML file
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Index Page (Amit Chhoker [103])

Sr. No. Name of Program Remarks


1. Write a Machine Learning Program for Email spam Detection.
2. Write a Machine Learning Program for iris data - knn
classification.
3. Write a Machine Learning Program for Open CV-digit detection.
4. Write a Machine Learning Program for Open CV-face/Object
Detection.
5. Write a Machine Learning Program for Linear Regression .
6. Write a Machine Learning Program for Logistic regression.
7. Write a Machine Learning Program for implementing Naive
Bayes algorithm.
8. Write a Machine Learning Program for implementing Decision
Trees - CART, CHAID.
9. write a Machine Learning Program to implement SVM - both
classification and regression - using all kernels.
10. Write a Machine Learning Program For Multi-class classification
- using more than 4 classes.
11. Write a Machine Learning Program for Implement Clustering -
K-means / Kernel K-means.
12. Write a Machine Learning Program for DBSCAN Algo
implementation.
13. Write a Machine Learning Program to implement Association
Rules implementation using data from domains such as Market
Basket Analysis/ Healthcare/ Data mining etc. -
Apriori and ECLAT Algo.
1.Write a Machine Learning Program for Email spam Detection.
Description:- This Python script implements a spam detection model using a Naïve Bayes classifier
with TF-IDF vectorization. It trains on a predefined dataset of emails, evaluates performance using
accuracy and classification reports, and allows real-time email classification.

Program Code:-
import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.pipeline import Pipeline

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

emails = [

("Win a brand new car! Click now to claim your prize.", 1),

("Urgent: Your account has been compromised. Reset your password immediately!", 1),

("Hey, how have you been? Long time no see!", 0),

("Don't miss our exclusive deal on laptops. Limited stock available!", 1),

("Meeting at 3 PM today in conference room B.", 0),

("Congratulations! You've won a free vacation. Claim now!", 1),

("Reminder: Your doctor's appointment is scheduled for tomorrow.", 0),

("Latest updates on your project are available. Check your email.", 0),

("Huge discounts on mobile phones. Limited offer!", 1),

("Let's catch up over coffee this weekend!", 0)

texts, labels = zip(*emails)

X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.3, random_state=42)

model = Pipeline([

("tfidf", TfidfVectorizer(stop_words='english', max_df=0.95, min_df=1, ngram_range=(1,2))),

("nb", MultinomialNB(alpha=0.1))

])
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))

conf_matrix = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:\n", conf_matrix)

def predict_email(email_text):

prediction = model.predict([email_text])[0]

return "Spam" if prediction == 1 else "Not Spam"

while True:

email_text = input("Enter an email (or type 'exit' to quit): ")

if email_text.lower() == 'exit':

break

print("Prediction:", predict_email(email_text))

Output:-
2. Write a Machine Learning Program for iris data - knn classification.
Description:- This machine learning program implements the K-Nearest Neighbors (KNN)
algorithm for classifying the Iris dataset. It loads the dataset, splits it into training and testing sets, and
applies KNN to predict the species of iris flowers based on sepal and petal measurements.

Program Code:-
from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, confusion_matrix

import numpy as np

iris = load_iris()

X, y = iris.data, iris.target

print("Sample data from Iris dataset (first 3 samples):")

for i in range(3):

print(f"Features: {X[i]}, Label: {iris.target_names[y[i]]}")

scaler = StandardScaler()

X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

print("\nPredictions on test data:", y_pred)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Output:-
3. Write a Machine Learning Program for Open CV-digit detection.
Description:- This program is a handwritten digit detection application using OpenCV, Tkinter, and
a pre-trained MNIST model. The user can draw a digit on a black canvas, and the model will predict
the digit using deep learning (Keras & TensorFlow). The program preprocesses the image, normalizes
it, and feeds it into a Convolutional Neural Network (CNN) trained on the MNIST dataset to classify
the digit.

Program Code:-
import cv2

import numpy as np

import tkinter as tk

from tkinter import Canvas, Button

from PIL import Image, ImageDraw

from tensorflow.keras.models import load_model

model = load_model("mnist_digit_model.h5")

def preprocess_image(img):

img = img.convert('L') # Convert to grayscale

img = img.resize((28, 28)) # Resize to 28x28

img = np.array(img) / 255.0 # Normalize

img = img.reshape(1, 28, 28, 1) # Reshape for model input

return img

def predict_digit():

img.save("temp_digit.png") # Save drawn digit

processed_img = preprocess_image(img) # Process saved image

prediction = model.predict(processed_img)

digit = np.argmax(prediction)

result_label.config(text=f"Detected Digit: {digit}")

def clear_canvas():

canvas.delete("all")

global img, draw

img = Image.new("L", (300, 300), "black") # Reset image to black

draw = ImageDraw.Draw(img)
def draw_on_canvas(event):

x, y = event.x, event.y

canvas.create_oval(x, y, x+10, y+10, fill='white', outline='white')

draw.ellipse([x, y, x+10, y+10], fill='white') # Draw on image

root = tk.Tk()

root.title("Digit Detection")

canvas = Canvas(root, width=300, height=300, bg='black')

canvas.pack()

img = Image.new("L", (300, 300), "black") # Create black image

draw = ImageDraw.Draw(img)

canvas.bind("<B1-Motion>", draw_on_canvas)

btn_predict = Button(root, text="Predict Digit", command=predict_digit)

btn_predict.pack()

btn_clear = Button(root, text="Clear", command=clear_canvas)

btn_clear.pack()

result_label = tk.Label(root, text="Draw a digit and press 'Predict'")

result_label.pack()

root.mainloop()

Output:-
4. Write a Machine Learning Program for Open CV-face/Object Detection.
Description:- This program uses OpenCV and a pre-trained Haar Cascade model to detect faces and
objects in real-time from a webcam or an image. It loads the Haar cascade classifier, processes the
image frame-by-frame, and draws bounding boxes around detected faces or objects.

Program Code:-
import cv2

face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

cap = cv2.VideoCapture(0)

while True:

ret, frame = cap.read() # Read video frame

if not ret:

break

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Convert to grayscale

faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))

for (x, y, w, h) in faces:

cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) # Draw rectangle around face

cv2.imshow('Face Detection', frame)

if cv2.waitKey(1) & 0xFF == ord('q'): # Press 'q' to exit

break

cap.release()

cv2.destroyAllWindows()

Output:-
5. Write a Machine Learning Program for Linear Regression .
Description:- This program implements Linear Regression using Scikit-Learn to predict a
continuous output based on input features. It trains the model on a dataset, fits a best-fit line, and
evaluates the performance using metrics like Mean Squared Error (MSE) and R² score. Linear
Regression is widely used in predictive modeling, trend analysis, and financial forecasting.

Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(42)

x = 2 * np.random.rand(100, 1)

y = 4 + 3 * x + np.random.randn(100, 1) # y = 4 + 3x + noise

print("Sample data (first 5 points):")

for i in range(5):

print(f"x: {x[i][0]:.2f}, y: {y[i][0]:.2f}")

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

model = LinearRegression()

model.fit(x_train, y_train)

y_pred = model.predict(x_test)

plt.scatter(x_test, y_test, color='blue', label='Actual Data')

plt.plot(x_test, y_pred, color='red', linewidth=2, label='Regression Line')

plt.xlabel("X")

plt.ylabel("Y")

plt.legend()

plt.title("Linear Regression Example")

plt.show()

print(f"\nIntercept: {model.intercept_[0]:.2f}")

print(f"Slope: {model.coef_[0][0]:.2f}")
# Error checking

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

print(f"R² Score: {r2:.2f}")

Output:-
6. Write a Machine Learning Program for Logistic regression.
Description:- This program implements Logistic Regression using Scikit-Learn to perform binary or
multi-class classification. It trains the model on labeled data, applies the sigmoid function to estimate
probabilities, and evaluates performance using accuracy, confusion matrix, and classification report.
Logistic Regression is widely used in spam detection, medical diagnosis, and fraud detection.

Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.datasets import make_classification

from sklearn.metrics import accuracy_score, confusion_matrix

from sklearn.preprocessing import StandardScaler

X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0,

n_repeated=0, n_classes=2, random_state=42)

print("Sample data (first 5 points):")

for i in range(5):

print(f"Features: {X[i]}, Label: {y[i]}")

scaler = StandardScaler()

X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(f"\nAccuracy: {accuracy_score(y_test, y_pred):.2f}")

print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, edgecolors='k', marker='o', cmap=plt.cm.coolwarm)

plt.title("Logistic Regression Decision Boundary")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.show()

Output:-
7. Write a Machine Learning Program for implementing Naive Bayes algorithm.
Description:-This program applies the Naïve Bayes classification algorithm, which is based on
Bayes' Theorem and the assumption of feature independence.

Key Features:

 Loads and preprocesses data

 Trains a Naïve Bayes classifier (Gaussian, Multinomial, or Bernoulli)

 Predicts class labels and evaluates performance using metrics like accuracy, precision, and
recall

Applications:

 Spam detection (classifying emails as spam or not)

 Sentiment analysis (determining positive or negative sentiment in text)

 Medical diagnosis (predicting diseases based on symptoms)

Program Code:-
from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, confusion_matrix

iris = load_iris()

X, y = iris.data, iris.target

print("Sample Data (first 5 rows):")

for i in range(5):

print(f"Features: {X[i]}, Label: {y[i]}")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

nb = GaussianNB()

nb.fit(X_train, y_train)

y_pred = nb.predict(X_test)
print("Predictions:", y_pred)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

Output:-
8. Write a Machine Learning Program for implementing Decision Trees - CART,
CHAID.
Description:- This program implements Decision Tree classification using Scikit-Learn for CART
(Classification and Regression Trees) and CHAID (Chi-Square Automatic Interaction Detection). It
builds a tree-based model by splitting the dataset into branches based on entropy, Gini index, or chi-
square tests, making predictions and evaluating performance using accuracy and confusion matrix.
Decision Trees are widely used in customer segmentation, credit risk analysis, and medical diagnosis.

Program code:-
from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score, confusion_matrix

iris = load_iris()

X, y = iris.data, iris.target

print("Sample Data (first 5 rows):")

for i in range(5):

print(f"Features: {X[i]}, Label: {y[i]}")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

cart = DecisionTreeClassifier(criterion='gini')

cart.fit(X_train, y_train)

y_pred_cart = cart.predict(X_test)

print("CART Predictions:", y_pred_cart)

print("CART Accuracy:", accuracy_score(y_test, y_pred_cart))

print("CART Confusion Matrix:")

print(confusion_matrix(y_test, y_pred_cart))

chaid = DecisionTreeClassifier(criterion='entropy')

chaid.fit(X_train, y_train)

y_pred_chaid = chaid.predict(X_test)
print("CHAID Predictions:", y_pred_chaid)

print("CHAID Accuracy:", accuracy_score(y_test, y_pred_chaid))

print("CHAID Confusion Matrix:")

print(confusion_matrix(y_test, y_pred_chaid))

Output:-

9.write a Machine Learning Program to implement SVM - both classification and


regression - using all kernels.
Description:- This program implements Support Vector Machines (SVM) using Scikit-Learn for
both classification and regression tasks. It utilizes different kernel functions (linear, polynomial, radial
basis function (RBF), and sigmoid) to find the optimal decision boundary. The model is trained on a
dataset, makes predictions, and is evaluated using accuracy, precision, and R² score. SVM is widely
used in image classification, text categorization, and stock price prediction.

Program Code:-
import numpy as np

import matplotlib.pyplot as plt

import streamlit as st

from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC, SVR

from sklearn.metrics import accuracy_score, mean_squared_error

st.title("SVM - Classification & Regression")

iris = datasets.load_iris()

X, y = iris.data[:, :2], iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

kernels = ['linear', 'poly', 'rbf', 'sigmoid']

for kernel in kernels:

model = SVC(kernel=kernel)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

st.write(f"SVM Classification - Kernel: {kernel}")

st.write("Accuracy:", accuracy_score(y_test, y_pred))

fig, ax = plt.subplots()

ax.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, cmap='coolwarm', edgecolors='k')

st.pyplot(fig)

X_reg, y_reg = datasets.load_diabetes(return_X_y=True)

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2,


random_state=42)

for kernel in kernels:

svr = SVR(kernel=kernel)

svr.fit(X_train_reg, y_train_reg)
y_pred_reg = svr.predict(X_test_reg)

st.write(f"SVM Regression - Kernel: {kernel}")

st.write("MSE:", mean_squared_error(y_test_reg, y_pred_reg))

fig, ax = plt.subplots()

ax.scatter(y_test_reg, y_pred_reg, color='blue', edgecolors='k')

ax.plot([y_test_reg.min(), y_test_reg.max()], [y_test_reg.min(), y_test_reg.max()], 'r--')

st.pyplot(fig)

Output:-

10. Write a Machine Learning Program For Multi-class classification - using more than
4 classes.
Description:- This program implements Multi-Class Classification using Scikit-Learn, where the
model is trained to classify data into more than four categories. It applies machine learning algorithms
like Logistic Regression, Decision Trees, Naïve Bayes, or SVM, evaluates performance using
accuracy, confusion matrix, and classification report, and is commonly used in handwritten digit
recognition, medical diagnosis, and object detection. 🚀

Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report

X, y = make_classification(n_samples=500, n_features=6, n_classes=6, n_informative=4,


random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

models = {

"Logistic Regression": LogisticRegression(multi_class='multinomial', solver='lbfgs',


max_iter=500),

"Decision Tree": DecisionTreeClassifier(),

"Random Forest": RandomForestClassifier()

for name, model in models.items():

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(name)

print("Accuracy:", accuracy_score(y_test, y_pred))

print(classification_report(y_test, y_pred))

fig, ax = plt.subplots()

ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='rainbow', edgecolors='k')


ax.set_title("Test Data Distribution")

plt.show()

Output:-

11. Write a Machine Learning Program for Implement Clustering - K-means / Kernel
K-means.
Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

from sklearn.cluster import KMeans

from sklearn.metrics import confusion_matrix

from sklearn.svm import SVC

X, y = make_blobs(n_samples=300, centers=3, random_state=42)

print("Sample Data (first 5 rows):")

for i in range(5):

print(f"Features: {X[i]}, Label: {y[i]}")

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)

kmeans_labels = kmeans.fit_predict(X)

svm = SVC(kernel='rbf')

svm.fit(X, y)

svm_labels = svm.predict(X)

print("Confusion Matrix:")

print(confusion_matrix(y, kmeans_labels))

plt.scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='rainbow', edgecolors='k')

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='black', marker='X')

plt.title("K-Means Clustering")

plt.show()

Output:-

12. Write a Machine Learning Program for DBSCAN Algo implementation.


Program Code:
import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

from sklearn.cluster import DBSCAN

from sklearn.metrics import confusion_matrix

from sklearn.preprocessing import LabelEncoder

X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.6, random_state=42)

print("Sample Data (first 5 rows):")

for i in range(5):

print(f"Features: {X[i]}, Label: {y_true[i]}")

dbscan = DBSCAN(eps=0.5, min_samples=5)

y_pred = dbscan.fit_predict(X)

le = LabelEncoder()

y_true_encoded = le.fit_transform(y_true)

y_pred_encoded = le.fit_transform(y_pred)

print("Confusion Matrix:")

print(confusion_matrix(y_true_encoded, y_pred_encoded))

plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='rainbow', edgecolors='k')

plt.title("DBSCAN Clustering")

plt.show()

Output:-
13. Write a Machine Learning Program to implement Association Rules
implementation using data from domains such as Market Basket Analysis/ Healthcare/
Data mining etc. - Apriori and ECLAT Algo.
Description:- This program implements Apriori and ECLAT algorithms for association rule mining,
extracting meaningful patterns from datasets like market basket analysis, healthcare, and data mining.
It identifies itemset relationships using support, confidence, and lift to generate rules. Apriori uses
breadth-first search, while ECLAT applies depth-first search for frequent itemset mining. 🚀

Program Code:-
import pandas as pd

from mlxtend.frequent_patterns import apriori, association_rules

data = {

'Milk': [1, 0, 1, 1, 0, 1], 'Bread': [1, 1, 1, 0, 1, 1], 'Butter': [0, 1, 1, 1, 1, 0], 'Cheese': [1, 0, 0, 1, 1,1],

'Eggs': [0, 1, 1, 0, 1, 1]

df = pd.DataFrame(data)

frequent_itemsets_apriori = apriori(df, min_support=0.5, use_colnames=True)

rules_apriori = association_rules(frequent_itemsets_apriori, metric="lift", min_threshold=1.0)

print("Apriori Rules")

print(rules_apriori)

Output:-
14. Write a Machine Learning Program to implement Generative model.
Description:- 🧠 Key Components of the Program

1. Importing Required Libraries

o streamlit: For building the web UI.

o pandas: For handling the dataset.

o sklearn.naive_bayes.GaussianNB: The generative model used for classification.

o train_test_split, accuracy_score: For splitting data and evaluating the model.

2. Data Preprocessing
3. Gender labels are encoded as numeric values: Female = 0, Male = 1.
4. Features and labels are separated and split into training and testing datasets .
Program Code:-
import streamlit as st

import pandas as pd

from sklearn.naive_bayes import GaussianNB

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

data = {

'Height': [5.1, 5.5, 5.7, 5.0, 6.0, 6.1, 5.9, 6.2, 5.4, 5.8],

'Weight': [48, 60, 65, 45, 78, 85, 72, 90, 59, 66],

'Foot_Size': [6, 8, 9, 6, 12, 13, 11, 14, 7, 9],

'Gender': ['F', 'F', 'F', 'F', 'M', 'M', 'M', 'M', 'F', 'M']

df = pd.DataFrame(data)

df['Gender'] = df['Gender'].map({'F': 0, 'M': 1})

X = df[['Height', 'Weight', 'Foot_Size']]

y = df['Gender']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

model = GaussianNB()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

st.set_page_config(page_title="Gender Predictor", layout="centered")


st.title("Generative Model: Gender Predictor")

st.markdown("### Enter Physical Attributes")

height = st.number_input("Height (in feet)", min_value=4.0, max_value=7.0, value=5.5)

weight = st.number_input("Weight (in kg)", min_value=30, max_value=150, value=60)

foot_size = st.number_input("Foot Size (US)", min_value=4, max_value=18, value=9)

if st.button("Predict Gender"):

input_data = pd.DataFrame([[height, weight, foot_size]], columns=['Height', 'Weight', 'Foot_Size'])

prediction = model.predict(input_data)[0]

gender = "Male" if prediction == 1 else "Female"

st.success(f"Predicted Gender: {gender}")

st.markdown("---")

st.subheader("Model Evaluation")

st.write("Accuracy on Test Data:", accuracy_score(y_test, y_pred))

Output:-
15. Write a Machine Learning Code for implementing Matrix factorisation and
completion.
Description:- Matrix Factorization and Completion is a technique often used for recommendation
systems, where you have a matrix (typically sparse) and want to predict missing values. Matrix
factorization is based on decomposing the matrix into two lower-dimensional matrices that, when
multiplied, approximate the original matrix.

Here’s a simple implementation using Python with the following libraries:

1. numpy for matrix operations.

2. pandas for data handling.

3. tkinter for the UI to allow for matrix input and display.

Program Code:-

import numpy as np

import pandas as pd

from tkinter import *

from tkinter import messagebox

from sklearn.decomposition import NMF # Add this import statement

class MatrixFactorizationApp:

def __init__(self, root):

self.root = root

self.root.title("Matrix Factorization and Completion")

self.root.geometry("700x500")

self.root.config(bg="#f0f0f0")

self.title_label = Label(root, text="Matrix Factorization and Completion", font=("Arial", 16,


"bold"), bg="#f0f0f0")

self.title_label.pack(pady=10)

self.instruction_label = Label(root, text="Enter Matrix Data (rows separated by ';' and values by
','):", font=("Arial", 10), bg="#f0f0f0")

self.instruction_label.pack(pady=5)

# Create a frame for input

input_frame = Frame(root, bg="#f0f0f0")

input_frame.pack(pady=10)
self.entry = Text(input_frame, height=6, width=50, font=("Courier", 12), wrap=WORD,
padx=10, pady=10)

self.entry.pack(side=LEFT, padx=5)

# Scrollbar for the text widget

scrollbar = Scrollbar(input_frame, command=self.entry.yview)

scrollbar.pack(side=RIGHT, fill=Y)

self.entry.config(yscrollcommand=scrollbar.set)

self.submit_button = Button(root, text="Submit Matrix", command=self.submit_matrix,


font=("Arial", 12), bg="#4CAF50", fg="white", padx=20, pady=10)

self.submit_button.pack(pady=20)

self.result_frame = Frame(root, bg="#f0f0f0")

self.result_frame.pack(pady=10)

self.result_label = Label(self.result_frame, text="", font=("Courier", 12), bg="#f0f0f0")

self.result_label.pack()

def submit_matrix(self):

try:

matrix_data = self.entry.get("1.0", "end-1c").strip().split(';')

matrix = np.array([list(map(float, row.split(','))) for row in matrix_data])

if matrix.size == 0:

raise ValueError("Matrix cannot be empty.")

num_features = 3

model = NMF(n_components=num_features, init='random', random_state=42)

W = model.fit_transform(matrix)

H = model.components_

completed_matrix = np.dot(W, H)

# Show the completed matrix


result_df = pd.DataFrame(completed_matrix)

self.display_result(result_df)

except Exception as e:

messagebox.showerror("Error", f"An error occurred: {str(e)}")

def display_result(self, result_df):

# Clear previous results

for widget in self.result_frame.winfo_children():

widget.destroy()

result_text = Text(self.result_frame, height=10, width=60, font=("Courier", 12), wrap=WORD,


padx=10, pady=10)

result_text.insert(END, result_df.to_string(index=False))

result_text.config(state=DISABLED)

result_text.pack()

if __name__ == "__main__":

root = Tk()

app = MatrixFactorizationApp(root)

root.mainloop()

Output:-
16. write a Machine Learning Programs to implement Bagging.
✅ Description
Bagging (Bootstrap Aggregation) improves accuracy by combining predictions from multiple models
trained on different random subsets of the data. Random Forest is an ensemble of decision trees
trained using bagging.

This code uses scikit-learn’s RandomForestClassifier to classify sample data and predict on a test set.

Program Code:-
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, confusion_matrix

X_train = [

[5, 2], [6, 3], [7, 3], [8, 4], [1, 7], [2, 8], [1, 6], [2, 9]

y_train = [0, 0, 0, 0, 1, 1, 1, 1]

X_test = [

[3, 7],

[6, 2],

[2, 7]

y_test = [1, 0, 1]

print("Sample Training Data (features and labels):")

for i in range(len(X_train)):

print(f"Features: {X_train[i]}, Label: {y_train[i]}")

print("\nSample Testing Data (features and true labels):")

for i in range(len(X_test)):

print(f"Features: {X_test[i]}, True Label: {y_test[i]}")


model = RandomForestClassifier(n_estimators=10, random_state=42)

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print("\nPredictions:", predictions)

print("Accuracy:", accuracy_score(y_test, predictions))

print("Confusion Matrix:")

print(confusion_matrix(y_test, predictions))

Output:-
17. write a Machine Learning Programs to implement Boosting.
Description:
 The program uses the RandomForestClassifier from scikit-learn to implement the Random
Forest algorithm.

 The train_model function splits the dataset into training and testing sets, trains the model, and
calculates the accuracy.

 The accuracy is displayed in a message box once the model is trained.

 The GUI uses Tkinter to create a simple window with a button to start the training.

Program Code:-
import tkinter as tk

from tkinter import messagebox

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import pandas as pd

from sklearn.datasets import make_classification

def generate_data():

X, y = make_classification(n_samples=1000, n_features=10, n_informative=7, random_state=42)

return pd.DataFrame(X), pd.Series(y)

def train_model():

X, y = generate_data()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


conf_mat = confusion_matrix(y_test, y_pred)

class_report = classification_report(y_test, y_pred)

sample_preview = X.head(3).to_string(index=False)

msg = f"Sample Input Data (first 3 rows):\n{sample_preview}\n\n"

msg += f"Accuracy: {accuracy * 100:.2f}%\n\n"

msg += f"Confusion Matrix:\n{conf_mat}\n\n"

msg += f"Classification Report:\n{class_report}"

messagebox.showinfo("Training Complete", msg)

def create_ui():

window = tk.Tk()

window.title("Random Forest Model Trainer")

window.geometry("400x300")

train_button = tk.Button(window, text="Train Model", command=train_model, font=("Arial", 14))

train_button.pack(pady=20)

window.mainloop()

if __name__ == "__main__":

create_ui()

Output:-
18. Write a machine learning Program Random Forest Algorithm.
Description:- This program demonstrates a Random Forest Classification model using a synthetic
dataset. Here's what it does:

1. Data Generation:
Creates a synthetic dataset with 5 features and binary targets using make_classification.

2. Model Training:
Splits the data into training and test sets, trains a RandomForestClassifier, and makes
predictions.

3. Evaluation:
Prints predictions, accuracy, and a classification report.

Program code:-
from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import classification_report, accuracy_score

import pandas as pd

X, y = make_classification(n_samples=100, n_features=5, n_informative=3, n_redundant=0,


random_state=42)

df = pd.DataFrame(X, columns=[f"Feature_{i}" for i in range(1, 6)])

df["Target"] = y

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Sample Input Data:\n", df.head(), "\n")

print("Predictions on Test Set:", y_pred)

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

Output:-
19. Write a machine Learning Program to implement & Compare all classification
models using a standard dataset.
Description:-  Loads the Iris dataset with 150 samples and 3 classes.
 Splits data into training (70%) and testing (30%) sets.

 Scales features using StandardScaler for improved model performance.

 Implements six classification algorithms:

 Logistic Regression

 K-Nearest Neighbors (KNN)

 Support Vector Machine (SVM)

 Decision Tree

 Random Forest

 Naive Bayes

 Trains each model on the training data and predicts on the test set.

 Calculates accuracy and generates detailed classification reports for each model.

 Visualizes performance using confusion matrix heatmaps for all models.

 Displays a bar chart comparing accuracy scores of all classifiers.

Program Code:-
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

from sklearn.linear_model import LogisticRegression

from sklearn.neighbors import KNeighborsClassifier

from sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.naive_bayes import GaussianNB


data = load_iris()

X = data.data

y = data.target

labels = data.target_names

df = pd.DataFrame(X, columns=data.feature_names)

df["Target"] = y

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

models = {

"Logistic Regression": LogisticRegression(max_iter=200),

"KNN": KNeighborsClassifier(),

"SVM": SVC(probability=True),

"Decision Tree": DecisionTreeClassifier(),

"Random Forest": RandomForestClassifier(),

"Naive Bayes": GaussianNB()

results = []

plt.figure(figsize=(15, 10))

for i, (name, model) in enumerate(models.items(), 1):

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

acc = accuracy_score(y_test, y_pred)

results.append((name, acc))

cm = confusion_matrix(y_test, y_pred)

plt.subplot(2, 3, i)

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)

plt.title(name)

plt.xlabel("Predicted")
plt.ylabel("Actual")

plt.tight_layout()

plt.show()

results_df = pd.DataFrame(results, columns=["Model", "Accuracy"])

plt.figure(figsize=(10, 5))

sns.barplot(x="Model", y="Accuracy", data=results_df, palette="viridis")

plt.ylim(0.8, 1.0)

plt.title("Model Comparison (Accuracy)")

plt.ylabel("Accuracy Score")

plt.xticks(rotation=45)

plt.show()

print("Sample Input Data:\n", df.head(), "\n")

for name, model in models.items():

y_pred = model.predict(X_test)

print(f"{name} Accuracy: {accuracy_score(y_test, y_pred):.4f}")

print(f"Classification Report for {name}:\n", classification_report(y_test, y_pred,


target_names=labels))

Output:-
20. Write a machine Learning Program to implement & Compare all Regression models
using a standard dataset.
Description:-  Loads the California Housing dataset with 8 features and a continuous target
(median house value).

 Splits the data into training (70%) and testing (30%) sets.

 Scales features using StandardScaler for better model performance.

 Implements eight regression models:

 Linear Regression

 Ridge Regression

 Lasso Regression

 ElasticNet

 Decision Tree Regressor

 Random Forest Regressor

 Gradient Boosting Regressor

 Support Vector Regressor (SVR)

 Trains each model on the training data and predicts on the test data.

 Calculates and prints R² score and mean squared error (MSE) for each model to evaluate
performance.

 Visualizes residuals (prediction errors) for all models using scatter plots, helping to identify bias or
variance issues.

 Displays a bar chart comparing R² scores of all models, making it easy to see which performs best.

 Prints a preview of the dataset’s first few rows along with numerical evaluation results.

Program Code:-
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import fetch_california_housing

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet


from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

from sklearn.svm import SVR

data = fetch_california_housing()

X = data.data

y = data.target

feature_names = data.feature_names

df = pd.DataFrame(X, columns=feature_names)

df["Target"] = y

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

models = {

"Linear Regression": LinearRegression(),

"Ridge Regression": Ridge(),

"Lasso Regression": Lasso(),

"ElasticNet": ElasticNet(),

"Decision Tree": DecisionTreeRegressor(),

"Random Forest": RandomForestRegressor(),

"Gradient Boosting": GradientBoostingRegressor(),

"Support Vector Regressor": SVR()

results = []

plt.figure(figsize=(15, 10))

for i, (name, model) in enumerate(models.items(), 1):

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

results.append((name, r2, mse))


plt.subplot(2, 4, i)

residuals = y_test - y_pred

sns.scatterplot(x=y_pred, y=residuals)

plt.axhline(0, color='red', linestyle='--')

plt.xlabel("Predicted")

plt.ylabel("Residuals")

plt.title(f"{name}\nR2: {r2:.3f}, MSE: {mse:.3f}")

plt.tight_layout()

plt.show()

results_df = pd.DataFrame(results, columns=["Model", "R2 Score", "MSE"]).sort_values(by="R2


Score", ascending=False)

plt.figure(figsize=(10, 5))

sns.barplot(x="Model", y="R2 Score", data=results_df, palette="magma")

plt.title("Regression Model Comparison (R2 Score)")

plt.xticks(rotation=45)

plt.ylim(0, 1)

plt.show()

print("Sample Input Data:\n", df.head(), "\n")

for name, model in models.items():

y_pred = model.predict(X_test)

print(f"{name}:")

print(f" R2 Score: {r2_score(y_test, y_pred):.4f}")

print(f" Mean Squared Error: {mean_squared_error(y_test, y_pred):.4f}\n")

Output:-
21. Write a Machine Learning Programs to implement Sparse Modeling and Estimation
- use sparse matrix and then use any medical data to evaluate your model.
Description:- Dataset
 Uses the Breast Cancer Wisconsin medical dataset for binary classification (benign vs
malignant).

Sparse Matrix Conversion

 Converts the feature data into a sparse matrix format (csr_matrix) for memory-efficient
handling.

Data Splitting

 Splits the dataset into training (70%) and testing (30%) subsets.

Scaling

 Applies scaling to sparse data using StandardScaler with with_mean=False to preserve


sparsity.

Model Training

 Trains a Logistic Regression model that supports sparse input data.

Prediction

 Predicts labels on the test dataset using the trained model.

Accuracy Evaluation

 Calculates accuracy score to measure overall correctness of the model.

Program Code:-
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import sparse

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

data = load_breast_cancer()

X = data.data
y = data.target

feature_names = data.feature_names

labels = data.target_names

df = pd.DataFrame(X, columns=feature_names)

df['Target'] = y

# Convert to sparse matrix format (CSR)

X_sparse = sparse.csr_matrix(X)

# Split data

X_train, X_test, y_train, y_test = train_test_split(X_sparse, y, test_size=0.3, random_state=42)

# Scaling sparse data - scale dense then convert back sparse

scaler = StandardScaler(with_mean=False) # with_mean=False because sparse matrices can't be


centered

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

# Logistic Regression supports sparse input

model = LogisticRegression(max_iter=500)

model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

acc = accuracy_score(y_test, y_pred)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(6,5))

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)

plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.title(f"Confusion Matrix\nAccuracy: {acc:.4f}")

plt.show()

print("Sample Input Data:\n", df.head(), "\n")

print(f"Accuracy: {acc:.4f}")

print("Classification Report:\n", classification_report(y_test, y_pred, target_names=labels))

Output:-
22. Write a Machine Learning Program to implement Modeling Sequence / Time-Series
Data - ARIMA model.
Description:- Dataset: Monthly airline passenger numbers from 1949 to 1960.
 Data Split: 80% training, 20% testing for model validation.

 ARIMA Model:

 ARIMA(p,d,q) where

o p = number of autoregressive terms (lags), here 5

o d = degree of differencing, here 1 (to make the series stationary)

o q = number of moving average terms, here 0

 Training: Fits the ARIMA model to training data.

 Forecasting: Predicts passenger numbers for the test period.

 Evaluation: Calculates mean squared error (MSE) between actual and predicted values.

 Visualization: Plots training data, test data, and ARIMA forecast for visual comparison.

Program Code:-
import pandas as pd

import matplotlib.pyplot as plt

from statsmodels.tsa.arima.model import ARIMA

from sklearn.metrics import mean_squared_error

# Load dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'

data = pd.read_csv(url, parse_dates=['Month'], index_col='Month')

# Plot original time series

plt.figure(figsize=(10,4))

plt.plot(data, label='Original Data')

plt.title('Original Time Series Data')

plt.xlabel('Date')

plt.ylabel('Passengers')

plt.legend()

plt.show()
# Train-test split (80-20)

train_size = int(len(data)*0.8)

train, test = data[:train_size], data[train_size:]

plt.figure(figsize=(10,4))

plt.plot(train, label='Training Data')

plt.plot(test, label='Test Data')

plt.title('Train and Test Split')

plt.xlabel('Date')

plt.ylabel('Passengers')

plt.legend()

plt.show()

# Fit ARIMA model (order p=5, d=1, q=0)

model = ARIMA(train, order=(5,1,0))

model_fit = model.fit()

# Forecast

forecast = model_fit.forecast(steps=len(test))

# Plot forecast vs actual test data

plt.figure(figsize=(10,4))

plt.plot(test, label='Actual Test Data')

plt.plot(forecast, label='ARIMA Forecast')

plt.title('ARIMA Model Forecast vs Actual')

plt.xlabel('Date')

plt.ylabel('Passengers')

plt.legend()

plt.show()
# Calculate and print Mean Squared Error

mse = mean_squared_error(test, forecast)

print(f'Mean Squared Error: {mse:.3f}')

residuals = test.values.flatten() - forecast.values.flatten()

plt.figure(figsize=(10,4))

plt.plot(test.index, residuals)

plt.axhline(0, linestyle='--', color='red')

plt.title('Residuals (Actual - Forecast)')

plt.xlabel('Date')

plt.ylabel('Residuals')

plt.show()

Output:-
23. Write a Machine Learning Program to implement Deep Learning Models -
implement CNN, RNN.
Description:-  Uses the MNIST handwritten digit dataset.
 Displays sample images and their labels.

 Preprocesses data by normalizing and one-hot encoding.

 Builds and trains a CNN for spatial feature extraction.

 Builds and trains an RNN treating image rows as sequences.

 Evaluates and prints accuracy for both models.

Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.datasets import mnist

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, SimpleRNN

# Load MNIST data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Show sample labels and images

print("Sample labels:", y_train[:10])

plt.figure(figsize=(10,2))

for i in range(10):

plt.subplot(1, 10, i+1)

plt.imshow(X_train[i], cmap='gray')

plt.axis('off')

plt.title(y_train[i])

plt.show()

# Normalize data

X_train = X_train.astype('float32') / 255

X_test = X_test.astype('float32') / 255


# One-hot encode labels

y_train_cat = to_categorical(y_train, 10)

y_test_cat = to_categorical(y_test, 10)

### CNN Model ###

X_train_cnn = X_train.reshape(-1, 28, 28, 1)

X_test_cnn = X_test.reshape(-1, 28, 28, 1)

cnn = Sequential([

Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)),

MaxPooling2D(pool_size=(2,2)),

Conv2D(64, kernel_size=(3,3), activation='relu'),

MaxPooling2D(pool_size=(2,2)),

Flatten(),

Dense(128, activation='relu'),

Dense(10, activation='softmax')

])

cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

cnn.fit(X_train_cnn, y_train_cat, epochs=5, batch_size=128, validation_split=0.2)

cnn_loss, cnn_acc = cnn.evaluate(X_test_cnn, y_test_cat)

print(f'CNN Test Accuracy: {cnn_acc:.4f}')

### RNN Model ###

X_train_rnn = X_train

X_test_rnn = X_test

rnn = Sequential([

SimpleRNN(128, activation='relu', input_shape=(28, 28)),

Dense(10, activation='softmax')

])

rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

rnn.fit(X_train_rnn, y_train_cat, epochs=5, batch_size=128, validation_split=0.2)


rnn_loss, rnn_acc = rnn.evaluate(X_test_rnn, y_test_cat)

print(f'RNN Test Accuracy: {rnn_acc:.4f}')

Output:-
24. Write a Machine learning Program to implement Feature Representation Learning -
use any feature learning model for implementation.
Description:-
📘 Dataset

 Uses the MNIST dataset of handwritten digits.

 Images are 28x28 pixels, flattened to 784-dimensional vectors for input to the model.

⚙️Preprocessing

 Pixel values are normalized to range [0, 1].

 Input images are reshaped from (28,28) to (784,) for fully connected layers.

🧠 Autoencoder Model

 A type of unsupervised neural network used for feature extraction.

 Encoder: Compresses input to a 32-dimensional feature representation using fully connected


layers.

 Decoder: Reconstructs the original input from the compressed representation.

Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Model

from tensorflow.keras.layers import Input, Dense

from tensorflow.keras.utils import plot_model

# Load dataset

(X_train, _), (X_test, _) = mnist.load_data()

X_train = X_train.astype('float32') / 255.

X_test = X_test.astype('float32') / 255.

# Flatten the data

X_train = X_train.reshape((len(X_train), 28*28))

X_test = X_test.reshape((len(X_test), 28*28))


# Define Autoencoder architecture

input_img = Input(shape=(784,))

encoded = Dense(128, activation='relu')(input_img)

encoded = Dense(64, activation='relu')(encoded)

encoded = Dense(32, activation='relu')(encoded)

decoded = Dense(64, activation='relu')(encoded)

decoded = Dense(128, activation='relu')(decoded)

decoded = Dense(784, activation='sigmoid')(decoded)

# Build Model

autoencoder = Model(input_img, decoded)

encoder = Model(input_img, encoded)

# Compile and Train

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

autoencoder.fit(X_train, X_train,

epochs=10,

batch_size=256,

shuffle=True,

validation_data=(X_test, X_test))

# Encode test data to get learned features

encoded_imgs = encoder.predict(X_test)

# Visualize some encoded features

plt.figure(figsize=(8, 4))

for i in range(10):

plt.subplot(2, 10, i+1)

plt.imshow(X_test[i].reshape(28, 28), cmap='gray')

plt.axis('off')

plt.subplot(2, 10, i+11)


plt.imshow(encoded_imgs[i].reshape(4, 8), cmap='viridis')

plt.axis('off')

plt.suptitle('Top: Original Images | Bottom: Feature Representations')

plt.tight_layout()

plt.show()

Output:-
25. Write a Machine Learning program to implement Semi-supervised learning.
Description:-
📊 Dataset

 Uses the Digits dataset from sklearn, which contains 8x8 grayscale images of handwritten
digits (0–9).

 Each sample is a 64-dimensional feature vector.

🧪 Semi-Supervised Setup

 Simulates a real-world scenario by labeling only 5% of the training data.

 The remaining 95% are treated as unlabeled using -1 to denote unknown labels.

🔄 Model: Label Spreading

 Applies Label Spreading, a graph-based semi-supervised algorithm that propagates labels


from the few labeled samples to the unlabeled ones using feature similarity.

 Uses K-Nearest Neighbors (KNN) as the kernel for label spreading.

🧮 Evaluation

 Predicts labels on the test set using the trained model.

 Measures performance with:

o Accuracy Score

o Classification Report (precision, recall, F1-score)

Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import load_digits

from sklearn.model_selection import train_test_split

from sklearn.semi_supervised import LabelSpreading

from sklearn.metrics import classification_report, accuracy_score

from sklearn.decomposition import PCA

digits = load_digits()

X = digits.data

y = digits.target
print("Sample features:\n", X[:3])

print("Sample labels:\n", y[:3])

plt.figure(figsize=(6,2))

for i in range(6):

plt.subplot(1, 6, i+1)

plt.imshow(digits.images[i], cmap='gray')

plt.title(f"Label: {y[i]}")

plt.axis('off')

plt.suptitle("Sample Digit Images")

plt.tight_layout()

plt.show()

X_train, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

rng = np.random.RandomState(42)

y_train = np.copy(y_train_full)

mask = rng.rand(len(y_train)) < 0.95

y_train[mask] = -1

model = LabelSpreading(kernel='knn', n_neighbors=7)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

acc = accuracy_score(y_test, y_pred)

print("\nAccuracy on Test Data: {:.4f}".format(acc))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

pca = PCA(n_components=2)

X_2D = pca.fit_transform(X_test)

plt.figure(figsize=(6,5))
scatter = plt.scatter(X_2D[:,0], X_2D[:,1], c=y_pred, cmap='tab10', s=25)

plt.legend(*scatter.legend_elements(), title="Digits")

plt.title("Label Spreading Predictions (PCA View)")

plt.xlabel("PCA 1")

plt.ylabel("PCA 2")

plt.grid(True)

plt.show()

Output:-
26. Write a Machine Learning Program to implement Active Learning.
Program Code:-
import numpy as np

from sklearn.datasets import load_digits

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt

digits = load_digits()

X = digits.data

y = digits.target

X_train, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

initial_labels = 10

query_batch = 10

max_queries = 100

rng = np.random.RandomState(42)

n_samples = len(X_train)

indices = np.arange(n_samples)

rng.shuffle(indices)

X_pool = X_train[indices]

y_pool = y_train_full[indices]

X_labeled = X_pool[:initial_labels]

y_labeled = y_pool[:initial_labels]

X_unlabeled = X_pool[initial_labels:]

y_unlabeled = y_pool[initial_labels:]

model = LogisticRegression(max_iter=1000)

accuracies = []

for i in range(0, max_queries, query_batch):

model.fit(X_labeled, y_labeled)

y_pred = model.predict(X_test)

acc = accuracy_score(y_test, y_pred)


accuracies.append(acc)

if len(X_unlabeled) == 0:

break

probs = model.predict_proba(X_unlabeled)

uncertainty = 1 - np.max(probs, axis=1)

query_idx = np.argsort(uncertainty)[-query_batch:]

X_labeled = np.concatenate((X_labeled, X_unlabeled[query_idx]))

y_labeled = np.concatenate((y_labeled, y_unlabeled[query_idx]))

X_unlabeled = np.delete(X_unlabeled, query_idx, axis=0)

y_unlabeled = np.delete(y_unlabeled, query_idx, axis=0)

print("\nFinal Classification Report:\n", classification_report(y_test, y_pred))

print("\nFinal Accuracy: {:.4f}".format(accuracies[-1]))

plt.plot(range(initial_labels, initial_labels + len(accuracies) * query_batch, query_batch), accuracies,


marker='o')

plt.title("Active Learning Accuracy Over Queries")

plt.xlabel("Number of Labeled Samples")

plt.ylabel("Accuracy")

plt.grid(True)

plt.show()

Output:-
27. Write a Machine Learning Program to implement Reinforcement Learning.
Description:- 🔁 Environment:
 Uses FrozenLake-v1, a classic RL environment with discrete states and actions.

 The agent learns to reach a goal while avoiding holes.

🧠 Algorithm:

 Implements Q-Learning, a value-based reinforcement learning algorithm.

 Uses an epsilon-greedy strategy for exploration vs. exploitation.

🧪 Training:

 Runs for 1000 episodes.

Program Code:-
import numpy as np

import gym

import random

import matplotlib.pyplot as plt

env = gym.make("FrozenLake-v1", is_slippery=False)

state_size = env.observation_space.n

action_size = env.action_space.n

q_table = np.zeros((state_size, action_size))

total_episodes = 1000

learning_rate = 0.8

gamma = 0.95

epsilon = 1.0

max_epsilon = 1.0

min_epsilon = 0.01

decay_rate = 0.005

rewards = []

for episode in range(total_episodes):

state = env.reset()[0]

total_rewards = 0

done = False

while not done:


exp_exp_tradeoff = random.uniform(0, 1)

if exp_exp_tradeoff > epsilon:

action = np.argmax(q_table[state, :])

else:

action = env.action_space.sample()

new_state, reward, done, _, _ = env.step(action)

q_table[state, action] = q_table[state, action] + learning_rate * (

reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action]

total_rewards += reward

state = new_state

epsilon = min_epsilon + (max_epsilon - min_epsilon) * np.exp(-decay_rate * episode)

rewards.append(total_rewards)

print("Sample of Learned Q-table (first 5 states):\n")

print(np.round(q_table[:5], 3))

print("\nAverage Reward over 1000 episodes:", np.mean(rewards))

plt.plot(range(len(rewards)), rewards, alpha=0.5, label='Reward per Episode')

plt.title("Reward per Episode")

plt.xlabel("Episode")

plt.ylabel("Reward")

plt.grid(True)

plt.show()

Output:-

28. Write a Machine Learning Program to implement Online Learning.


Program Code:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import SGDClassifier

from sklearn.datasets import load_digits

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

digits = load_digits()

X, y = digits.data, digits.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

classes = np.unique(y)

clf = SGDClassifier(loss="log_loss", max_iter=1, learning_rate='optimal', tol=None)

batch_size = 100

train_size = X_train.shape[0]

num_batches = train_size // batch_size

accuracies = []

for i in range(num_batches):

start = i * batch_size

end = start + batch_size

X_batch = X_train[start:end]

y_batch = y_train[start:end]

clf.partial_fit(X_batch, y_batch, classes=classes)

y_pred = clf.predict(X_test)

acc = accuracy_score(y_test, y_pred)

accuracies.append(acc)
print("Sample batch input data (first batch):\n", X_train[:3])

print("Sample batch labels (first batch):\n", y_train[:3])

print("\nFinal Accuracy after Online Learning: {:.4f}".format(accuracies[-1]))

plt.plot(range(1, num_batches + 1), accuracies, marker='o')

plt.title("Online Learning: Accuracy Over Batches")

plt.xlabel("Batch Number")

plt.ylabel("Accuracy")

plt.grid(True)

plt.show()

Ouput:-
29. Write a Machine Learning Program to implement Distributed Learning

Program code:-
import numpy as np

from sklearn.datasets import load_digits

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,


ConfusionMatrixDisplay

from joblib import Parallel, delayed

import matplotlib.pyplot as plt

digits = load_digits()

X, y = digits.data, digits.target

print("Sample input data:\n", X[:3])

print("Sample labels:\n", y[:3])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def train_model(X_part, y_part):

clf = RandomForestClassifier(n_estimators=100, random_state=42)

clf.fit(X_part, y_part)

return clf

n_chunks = 4

X_splits = np.array_split(X_train, n_chunks)

y_splits = np.array_split(y_train, n_chunks)

models = Parallel(n_jobs=n_chunks)(delayed(train_model)(X_splits[i], y_splits[i]) for i in


range(n_chunks))

probs = [model.predict_proba(X_test) for model in models]

avg_probs = np.mean(probs, axis=0)

y_pred = np.argmax(avg_probs, axis=1)

acc = accuracy_score(y_test, y_pred)

print("\nFinal Accuracy: {:.4f}".format(acc))

print("\nClassification Report:\n", classification_report(y_test, y_pred))


cm = confusion_matrix(y_test, y_pred)

disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(y))

fig, ax = plt.subplots(figsize=(8, 8))

disp.plot(ax=ax, cmap='Blues')

plt.title("Confusion Matrix - Distributed Random Forest")

plt.grid(False)

plt.show()

Output:-

You might also like