Index Page (Amit Chhoker [103])
Sr. No. Name of Program Remarks
1. Write a Machine Learning Program for Email spam Detection.
2. Write a Machine Learning Program for iris data - knn
classification.
3. Write a Machine Learning Program for Open CV-digit detection.
4. Write a Machine Learning Program for Open CV-face/Object
Detection.
5. Write a Machine Learning Program for Linear Regression .
6. Write a Machine Learning Program for Logistic regression.
7. Write a Machine Learning Program for implementing Naive
Bayes algorithm.
8. Write a Machine Learning Program for implementing Decision
Trees - CART, CHAID.
9. write a Machine Learning Program to implement SVM - both
classification and regression - using all kernels.
10. Write a Machine Learning Program For Multi-class classification
- using more than 4 classes.
11. Write a Machine Learning Program for Implement Clustering -
K-means / Kernel K-means.
12. Write a Machine Learning Program for DBSCAN Algo
implementation.
13. Write a Machine Learning Program to implement Association
Rules implementation using data from domains such as Market
Basket Analysis/ Healthcare/ Data mining etc. -
Apriori and ECLAT Algo.
1.Write a Machine Learning Program for Email spam Detection.
Description:- This Python script implements a spam detection model using a Naïve Bayes classifier
with TF-IDF vectorization. It trains on a predefined dataset of emails, evaluates performance using
accuracy and classification reports, and allows real-time email classification.
Program Code:-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
emails = [
("Win a brand new car! Click now to claim your prize.", 1),
("Urgent: Your account has been compromised. Reset your password immediately!", 1),
("Hey, how have you been? Long time no see!", 0),
("Don't miss our exclusive deal on laptops. Limited stock available!", 1),
("Meeting at 3 PM today in conference room B.", 0),
("Congratulations! You've won a free vacation. Claim now!", 1),
("Reminder: Your doctor's appointment is scheduled for tomorrow.", 0),
("Latest updates on your project are available. Check your email.", 0),
("Huge discounts on mobile phones. Limited offer!", 1),
("Let's catch up over coffee this weekend!", 0)
texts, labels = zip(*emails)
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.3, random_state=42)
model = Pipeline([
("tfidf", TfidfVectorizer(stop_words='english', max_df=0.95, min_df=1, ngram_range=(1,2))),
("nb", MultinomialNB(alpha=0.1))
])
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)
def predict_email(email_text):
prediction = model.predict([email_text])[0]
return "Spam" if prediction == 1 else "Not Spam"
while True:
email_text = input("Enter an email (or type 'exit' to quit): ")
if email_text.lower() == 'exit':
break
print("Prediction:", predict_email(email_text))
Output:-
2. Write a Machine Learning Program for iris data - knn classification.
Description:- This machine learning program implements the K-Nearest Neighbors (KNN)
algorithm for classifying the Iris dataset. It loads the dataset, splits it into training and testing sets, and
applies KNN to predict the species of iris flowers based on sepal and petal measurements.
Program Code:-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np
iris = load_iris()
X, y = iris.data, iris.target
print("Sample data from Iris dataset (first 3 samples):")
for i in range(3):
print(f"Features: {X[i]}, Label: {iris.target_names[y[i]]}")
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print("\nPredictions on test data:", y_pred)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
Output:-
3. Write a Machine Learning Program for Open CV-digit detection.
Description:- This program is a handwritten digit detection application using OpenCV, Tkinter, and
a pre-trained MNIST model. The user can draw a digit on a black canvas, and the model will predict
the digit using deep learning (Keras & TensorFlow). The program preprocesses the image, normalizes
it, and feeds it into a Convolutional Neural Network (CNN) trained on the MNIST dataset to classify
the digit.
Program Code:-
import cv2
import numpy as np
import tkinter as tk
from tkinter import Canvas, Button
from PIL import Image, ImageDraw
from tensorflow.keras.models import load_model
model = load_model("mnist_digit_model.h5")
def preprocess_image(img):
img = img.convert('L') # Convert to grayscale
img = img.resize((28, 28)) # Resize to 28x28
img = np.array(img) / 255.0 # Normalize
img = img.reshape(1, 28, 28, 1) # Reshape for model input
return img
def predict_digit():
img.save("temp_digit.png") # Save drawn digit
processed_img = preprocess_image(img) # Process saved image
prediction = model.predict(processed_img)
digit = np.argmax(prediction)
result_label.config(text=f"Detected Digit: {digit}")
def clear_canvas():
canvas.delete("all")
global img, draw
img = Image.new("L", (300, 300), "black") # Reset image to black
draw = ImageDraw.Draw(img)
def draw_on_canvas(event):
x, y = event.x, event.y
canvas.create_oval(x, y, x+10, y+10, fill='white', outline='white')
draw.ellipse([x, y, x+10, y+10], fill='white') # Draw on image
root = tk.Tk()
root.title("Digit Detection")
canvas = Canvas(root, width=300, height=300, bg='black')
canvas.pack()
img = Image.new("L", (300, 300), "black") # Create black image
draw = ImageDraw.Draw(img)
canvas.bind("<B1-Motion>", draw_on_canvas)
btn_predict = Button(root, text="Predict Digit", command=predict_digit)
btn_predict.pack()
btn_clear = Button(root, text="Clear", command=clear_canvas)
btn_clear.pack()
result_label = tk.Label(root, text="Draw a digit and press 'Predict'")
result_label.pack()
root.mainloop()
Output:-
4. Write a Machine Learning Program for Open CV-face/Object Detection.
Description:- This program uses OpenCV and a pre-trained Haar Cascade model to detect faces and
objects in real-time from a webcam or an image. It loads the Haar cascade classifier, processes the
image frame-by-frame, and draws bounding boxes around detected faces or objects.
Program Code:-
import cv2
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read() # Read video frame
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Convert to grayscale
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) # Draw rectangle around face
cv2.imshow('Face Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'): # Press 'q' to exit
break
cap.release()
cv2.destroyAllWindows()
Output:-
5. Write a Machine Learning Program for Linear Regression .
Description:- This program implements Linear Regression using Scikit-Learn to predict a
continuous output based on input features. It trains the model on a dataset, fits a best-fit line, and
evaluates the performance using metrics like Mean Squared Error (MSE) and R² score. Linear
Regression is widely used in predictive modeling, trend analysis, and financial forecasting.
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
np.random.seed(42)
x = 2 * np.random.rand(100, 1)
y = 4 + 3 * x + np.random.randn(100, 1) # y = 4 + 3x + noise
print("Sample data (first 5 points):")
for i in range(5):
print(f"x: {x[i][0]:.2f}, y: {y[i][0]:.2f}")
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
plt.scatter(x_test, y_test, color='blue', label='Actual Data')
plt.plot(x_test, y_pred, color='red', linewidth=2, label='Regression Line')
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.title("Linear Regression Example")
plt.show()
print(f"\nIntercept: {model.intercept_[0]:.2f}")
print(f"Slope: {model.coef_[0][0]:.2f}")
# Error checking
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.2f}")
Output:-
6. Write a Machine Learning Program for Logistic regression.
Description:- This program implements Logistic Regression using Scikit-Learn to perform binary or
multi-class classification. It trains the model on labeled data, applies the sigmoid function to estimate
probabilities, and evaluates performance using accuracy, confusion matrix, and classification report.
Logistic Regression is widely used in spam detection, medical diagnosis, and fraud detection.
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.preprocessing import StandardScaler
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0,
n_repeated=0, n_classes=2, random_state=42)
print("Sample data (first 5 points):")
for i in range(5):
print(f"Features: {X[i]}, Label: {y[i]}")
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"\nAccuracy: {accuracy_score(y_test, y_pred):.2f}")
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, edgecolors='k', marker='o', cmap=plt.cm.coolwarm)
plt.title("Logistic Regression Decision Boundary")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
Output:-
7. Write a Machine Learning Program for implementing Naive Bayes algorithm.
Description:-This program applies the Naïve Bayes classification algorithm, which is based on
Bayes' Theorem and the assumption of feature independence.
Key Features:
Loads and preprocesses data
Trains a Naïve Bayes classifier (Gaussian, Multinomial, or Bernoulli)
Predicts class labels and evaluates performance using metrics like accuracy, precision, and
recall
Applications:
Spam detection (classifying emails as spam or not)
Sentiment analysis (determining positive or negative sentiment in text)
Medical diagnosis (predicting diseases based on symptoms)
Program Code:-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix
iris = load_iris()
X, y = iris.data, iris.target
print("Sample Data (first 5 rows):")
for i in range(5):
print(f"Features: {X[i]}, Label: {y[i]}")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
nb = GaussianNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
print("Predictions:", y_pred)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
Output:-
8. Write a Machine Learning Program for implementing Decision Trees - CART,
CHAID.
Description:- This program implements Decision Tree classification using Scikit-Learn for CART
(Classification and Regression Trees) and CHAID (Chi-Square Automatic Interaction Detection). It
builds a tree-based model by splitting the dataset into branches based on entropy, Gini index, or chi-
square tests, making predictions and evaluating performance using accuracy and confusion matrix.
Decision Trees are widely used in customer segmentation, credit risk analysis, and medical diagnosis.
Program code:-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
iris = load_iris()
X, y = iris.data, iris.target
print("Sample Data (first 5 rows):")
for i in range(5):
print(f"Features: {X[i]}, Label: {y[i]}")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
cart = DecisionTreeClassifier(criterion='gini')
cart.fit(X_train, y_train)
y_pred_cart = cart.predict(X_test)
print("CART Predictions:", y_pred_cart)
print("CART Accuracy:", accuracy_score(y_test, y_pred_cart))
print("CART Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_cart))
chaid = DecisionTreeClassifier(criterion='entropy')
chaid.fit(X_train, y_train)
y_pred_chaid = chaid.predict(X_test)
print("CHAID Predictions:", y_pred_chaid)
print("CHAID Accuracy:", accuracy_score(y_test, y_pred_chaid))
print("CHAID Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_chaid))
Output:-
9.write a Machine Learning Program to implement SVM - both classification and
regression - using all kernels.
Description:- This program implements Support Vector Machines (SVM) using Scikit-Learn for
both classification and regression tasks. It utilizes different kernel functions (linear, polynomial, radial
basis function (RBF), and sigmoid) to find the optimal decision boundary. The model is trained on a
dataset, makes predictions, and is evaluated using accuracy, precision, and R² score. SVM is widely
used in image classification, text categorization, and stock price prediction.
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
import streamlit as st
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, SVR
from sklearn.metrics import accuracy_score, mean_squared_error
st.title("SVM - Classification & Regression")
iris = datasets.load_iris()
X, y = iris.data[:, :2], iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
for kernel in kernels:
model = SVC(kernel=kernel)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
st.write(f"SVM Classification - Kernel: {kernel}")
st.write("Accuracy:", accuracy_score(y_test, y_pred))
fig, ax = plt.subplots()
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, cmap='coolwarm', edgecolors='k')
st.pyplot(fig)
X_reg, y_reg = datasets.load_diabetes(return_X_y=True)
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2,
random_state=42)
for kernel in kernels:
svr = SVR(kernel=kernel)
svr.fit(X_train_reg, y_train_reg)
y_pred_reg = svr.predict(X_test_reg)
st.write(f"SVM Regression - Kernel: {kernel}")
st.write("MSE:", mean_squared_error(y_test_reg, y_pred_reg))
fig, ax = plt.subplots()
ax.scatter(y_test_reg, y_pred_reg, color='blue', edgecolors='k')
ax.plot([y_test_reg.min(), y_test_reg.max()], [y_test_reg.min(), y_test_reg.max()], 'r--')
st.pyplot(fig)
Output:-
10. Write a Machine Learning Program For Multi-class classification - using more than
4 classes.
Description:- This program implements Multi-Class Classification using Scikit-Learn, where the
model is trained to classify data into more than four categories. It applies machine learning algorithms
like Logistic Regression, Decision Trees, Naïve Bayes, or SVM, evaluates performance using
accuracy, confusion matrix, and classification report, and is commonly used in handwritten digit
recognition, medical diagnosis, and object detection. 🚀
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
X, y = make_classification(n_samples=500, n_features=6, n_classes=6, n_informative=4,
random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
models = {
"Logistic Regression": LogisticRegression(multi_class='multinomial', solver='lbfgs',
max_iter=500),
"Decision Tree": DecisionTreeClassifier(),
"Random Forest": RandomForestClassifier()
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(name)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
fig, ax = plt.subplots()
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='rainbow', edgecolors='k')
ax.set_title("Test Data Distribution")
plt.show()
Output:-
11. Write a Machine Learning Program for Implement Clustering - K-means / Kernel
K-means.
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import confusion_matrix
from sklearn.svm import SVC
X, y = make_blobs(n_samples=300, centers=3, random_state=42)
print("Sample Data (first 5 rows):")
for i in range(5):
print(f"Features: {X[i]}, Label: {y[i]}")
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans_labels = kmeans.fit_predict(X)
svm = SVC(kernel='rbf')
svm.fit(X, y)
svm_labels = svm.predict(X)
print("Confusion Matrix:")
print(confusion_matrix(y, kmeans_labels))
plt.scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='rainbow', edgecolors='k')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='black', marker='X')
plt.title("K-Means Clustering")
plt.show()
Output:-
12. Write a Machine Learning Program for DBSCAN Algo implementation.
Program Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder
X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.6, random_state=42)
print("Sample Data (first 5 rows):")
for i in range(5):
print(f"Features: {X[i]}, Label: {y_true[i]}")
dbscan = DBSCAN(eps=0.5, min_samples=5)
y_pred = dbscan.fit_predict(X)
le = LabelEncoder()
y_true_encoded = le.fit_transform(y_true)
y_pred_encoded = le.fit_transform(y_pred)
print("Confusion Matrix:")
print(confusion_matrix(y_true_encoded, y_pred_encoded))
plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='rainbow', edgecolors='k')
plt.title("DBSCAN Clustering")
plt.show()
Output:-
13. Write a Machine Learning Program to implement Association Rules
implementation using data from domains such as Market Basket Analysis/ Healthcare/
Data mining etc. - Apriori and ECLAT Algo.
Description:- This program implements Apriori and ECLAT algorithms for association rule mining,
extracting meaningful patterns from datasets like market basket analysis, healthcare, and data mining.
It identifies itemset relationships using support, confidence, and lift to generate rules. Apriori uses
breadth-first search, while ECLAT applies depth-first search for frequent itemset mining. 🚀
Program Code:-
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
data = {
'Milk': [1, 0, 1, 1, 0, 1], 'Bread': [1, 1, 1, 0, 1, 1], 'Butter': [0, 1, 1, 1, 1, 0], 'Cheese': [1, 0, 0, 1, 1,1],
'Eggs': [0, 1, 1, 0, 1, 1]
df = pd.DataFrame(data)
frequent_itemsets_apriori = apriori(df, min_support=0.5, use_colnames=True)
rules_apriori = association_rules(frequent_itemsets_apriori, metric="lift", min_threshold=1.0)
print("Apriori Rules")
print(rules_apriori)
Output:-
14. Write a Machine Learning Program to implement Generative model.
Description:- 🧠 Key Components of the Program
1. Importing Required Libraries
o streamlit: For building the web UI.
o pandas: For handling the dataset.
o sklearn.naive_bayes.GaussianNB: The generative model used for classification.
o train_test_split, accuracy_score: For splitting data and evaluating the model.
2. Data Preprocessing
3. Gender labels are encoded as numeric values: Female = 0, Male = 1.
4. Features and labels are separated and split into training and testing datasets .
Program Code:-
import streamlit as st
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = {
'Height': [5.1, 5.5, 5.7, 5.0, 6.0, 6.1, 5.9, 6.2, 5.4, 5.8],
'Weight': [48, 60, 65, 45, 78, 85, 72, 90, 59, 66],
'Foot_Size': [6, 8, 9, 6, 12, 13, 11, 14, 7, 9],
'Gender': ['F', 'F', 'F', 'F', 'M', 'M', 'M', 'M', 'F', 'M']
df = pd.DataFrame(data)
df['Gender'] = df['Gender'].map({'F': 0, 'M': 1})
X = df[['Height', 'Weight', 'Foot_Size']]
y = df['Gender']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
st.set_page_config(page_title="Gender Predictor", layout="centered")
st.title("Generative Model: Gender Predictor")
st.markdown("### Enter Physical Attributes")
height = st.number_input("Height (in feet)", min_value=4.0, max_value=7.0, value=5.5)
weight = st.number_input("Weight (in kg)", min_value=30, max_value=150, value=60)
foot_size = st.number_input("Foot Size (US)", min_value=4, max_value=18, value=9)
if st.button("Predict Gender"):
input_data = pd.DataFrame([[height, weight, foot_size]], columns=['Height', 'Weight', 'Foot_Size'])
prediction = model.predict(input_data)[0]
gender = "Male" if prediction == 1 else "Female"
st.success(f"Predicted Gender: {gender}")
st.markdown("---")
st.subheader("Model Evaluation")
st.write("Accuracy on Test Data:", accuracy_score(y_test, y_pred))
Output:-
15. Write a Machine Learning Code for implementing Matrix factorisation and
completion.
Description:- Matrix Factorization and Completion is a technique often used for recommendation
systems, where you have a matrix (typically sparse) and want to predict missing values. Matrix
factorization is based on decomposing the matrix into two lower-dimensional matrices that, when
multiplied, approximate the original matrix.
Here’s a simple implementation using Python with the following libraries:
1. numpy for matrix operations.
2. pandas for data handling.
3. tkinter for the UI to allow for matrix input and display.
Program Code:-
import numpy as np
import pandas as pd
from tkinter import *
from tkinter import messagebox
from sklearn.decomposition import NMF # Add this import statement
class MatrixFactorizationApp:
def __init__(self, root):
self.root = root
self.root.title("Matrix Factorization and Completion")
self.root.geometry("700x500")
self.root.config(bg="#f0f0f0")
self.title_label = Label(root, text="Matrix Factorization and Completion", font=("Arial", 16,
"bold"), bg="#f0f0f0")
self.title_label.pack(pady=10)
self.instruction_label = Label(root, text="Enter Matrix Data (rows separated by ';' and values by
','):", font=("Arial", 10), bg="#f0f0f0")
self.instruction_label.pack(pady=5)
# Create a frame for input
input_frame = Frame(root, bg="#f0f0f0")
input_frame.pack(pady=10)
self.entry = Text(input_frame, height=6, width=50, font=("Courier", 12), wrap=WORD,
padx=10, pady=10)
self.entry.pack(side=LEFT, padx=5)
# Scrollbar for the text widget
scrollbar = Scrollbar(input_frame, command=self.entry.yview)
scrollbar.pack(side=RIGHT, fill=Y)
self.entry.config(yscrollcommand=scrollbar.set)
self.submit_button = Button(root, text="Submit Matrix", command=self.submit_matrix,
font=("Arial", 12), bg="#4CAF50", fg="white", padx=20, pady=10)
self.submit_button.pack(pady=20)
self.result_frame = Frame(root, bg="#f0f0f0")
self.result_frame.pack(pady=10)
self.result_label = Label(self.result_frame, text="", font=("Courier", 12), bg="#f0f0f0")
self.result_label.pack()
def submit_matrix(self):
try:
matrix_data = self.entry.get("1.0", "end-1c").strip().split(';')
matrix = np.array([list(map(float, row.split(','))) for row in matrix_data])
if matrix.size == 0:
raise ValueError("Matrix cannot be empty.")
num_features = 3
model = NMF(n_components=num_features, init='random', random_state=42)
W = model.fit_transform(matrix)
H = model.components_
completed_matrix = np.dot(W, H)
# Show the completed matrix
result_df = pd.DataFrame(completed_matrix)
self.display_result(result_df)
except Exception as e:
messagebox.showerror("Error", f"An error occurred: {str(e)}")
def display_result(self, result_df):
# Clear previous results
for widget in self.result_frame.winfo_children():
widget.destroy()
result_text = Text(self.result_frame, height=10, width=60, font=("Courier", 12), wrap=WORD,
padx=10, pady=10)
result_text.insert(END, result_df.to_string(index=False))
result_text.config(state=DISABLED)
result_text.pack()
if __name__ == "__main__":
root = Tk()
app = MatrixFactorizationApp(root)
root.mainloop()
Output:-
16. write a Machine Learning Programs to implement Bagging.
✅ Description
Bagging (Bootstrap Aggregation) improves accuracy by combining predictions from multiple models
trained on different random subsets of the data. Random Forest is an ensemble of decision trees
trained using bagging.
This code uses scikit-learn’s RandomForestClassifier to classify sample data and predict on a test set.
Program Code:-
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
X_train = [
[5, 2], [6, 3], [7, 3], [8, 4], [1, 7], [2, 8], [1, 6], [2, 9]
y_train = [0, 0, 0, 0, 1, 1, 1, 1]
X_test = [
[3, 7],
[6, 2],
[2, 7]
y_test = [1, 0, 1]
print("Sample Training Data (features and labels):")
for i in range(len(X_train)):
print(f"Features: {X_train[i]}, Label: {y_train[i]}")
print("\nSample Testing Data (features and true labels):")
for i in range(len(X_test)):
print(f"Features: {X_test[i]}, True Label: {y_test[i]}")
model = RandomForestClassifier(n_estimators=10, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("\nPredictions:", predictions)
print("Accuracy:", accuracy_score(y_test, predictions))
print("Confusion Matrix:")
print(confusion_matrix(y_test, predictions))
Output:-
17. write a Machine Learning Programs to implement Boosting.
Description:
The program uses the RandomForestClassifier from scikit-learn to implement the Random
Forest algorithm.
The train_model function splits the dataset into training and testing sets, trains the model, and
calculates the accuracy.
The accuracy is displayed in a message box once the model is trained.
The GUI uses Tkinter to create a simple window with a button to start the training.
Program Code:-
import tkinter as tk
from tkinter import messagebox
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import pandas as pd
from sklearn.datasets import make_classification
def generate_data():
X, y = make_classification(n_samples=1000, n_features=10, n_informative=7, random_state=42)
return pd.DataFrame(X), pd.Series(y)
def train_model():
X, y = generate_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_mat = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
sample_preview = X.head(3).to_string(index=False)
msg = f"Sample Input Data (first 3 rows):\n{sample_preview}\n\n"
msg += f"Accuracy: {accuracy * 100:.2f}%\n\n"
msg += f"Confusion Matrix:\n{conf_mat}\n\n"
msg += f"Classification Report:\n{class_report}"
messagebox.showinfo("Training Complete", msg)
def create_ui():
window = tk.Tk()
window.title("Random Forest Model Trainer")
window.geometry("400x300")
train_button = tk.Button(window, text="Train Model", command=train_model, font=("Arial", 14))
train_button.pack(pady=20)
window.mainloop()
if __name__ == "__main__":
create_ui()
Output:-
18. Write a machine learning Program Random Forest Algorithm.
Description:- This program demonstrates a Random Forest Classification model using a synthetic
dataset. Here's what it does:
1. Data Generation:
Creates a synthetic dataset with 5 features and binary targets using make_classification.
2. Model Training:
Splits the data into training and test sets, trains a RandomForestClassifier, and makes
predictions.
3. Evaluation:
Prints predictions, accuracy, and a classification report.
Program code:-
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
import pandas as pd
X, y = make_classification(n_samples=100, n_features=5, n_informative=3, n_redundant=0,
random_state=42)
df = pd.DataFrame(X, columns=[f"Feature_{i}" for i in range(1, 6)])
df["Target"] = y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Sample Input Data:\n", df.head(), "\n")
print("Predictions on Test Set:", y_pred)
print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
Output:-
19. Write a machine Learning Program to implement & Compare all classification
models using a standard dataset.
Description:- Loads the Iris dataset with 150 samples and 3 classes.
Splits data into training (70%) and testing (30%) sets.
Scales features using StandardScaler for improved model performance.
Implements six classification algorithms:
Logistic Regression
K-Nearest Neighbors (KNN)
Support Vector Machine (SVM)
Decision Tree
Random Forest
Naive Bayes
Trains each model on the training data and predicts on the test set.
Calculates accuracy and generates detailed classification reports for each model.
Visualizes performance using confusion matrix heatmaps for all models.
Displays a bar chart comparing accuracy scores of all classifiers.
Program Code:-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
data = load_iris()
X = data.data
y = data.target
labels = data.target_names
df = pd.DataFrame(X, columns=data.feature_names)
df["Target"] = y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
models = {
"Logistic Regression": LogisticRegression(max_iter=200),
"KNN": KNeighborsClassifier(),
"SVM": SVC(probability=True),
"Decision Tree": DecisionTreeClassifier(),
"Random Forest": RandomForestClassifier(),
"Naive Bayes": GaussianNB()
results = []
plt.figure(figsize=(15, 10))
for i, (name, model) in enumerate(models.items(), 1):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
results.append((name, acc))
cm = confusion_matrix(y_test, y_pred)
plt.subplot(2, 3, i)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)
plt.title(name)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.tight_layout()
plt.show()
results_df = pd.DataFrame(results, columns=["Model", "Accuracy"])
plt.figure(figsize=(10, 5))
sns.barplot(x="Model", y="Accuracy", data=results_df, palette="viridis")
plt.ylim(0.8, 1.0)
plt.title("Model Comparison (Accuracy)")
plt.ylabel("Accuracy Score")
plt.xticks(rotation=45)
plt.show()
print("Sample Input Data:\n", df.head(), "\n")
for name, model in models.items():
y_pred = model.predict(X_test)
print(f"{name} Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Classification Report for {name}:\n", classification_report(y_test, y_pred,
target_names=labels))
Output:-
20. Write a machine Learning Program to implement & Compare all Regression models
using a standard dataset.
Description:- Loads the California Housing dataset with 8 features and a continuous target
(median house value).
Splits the data into training (70%) and testing (30%) sets.
Scales features using StandardScaler for better model performance.
Implements eight regression models:
Linear Regression
Ridge Regression
Lasso Regression
ElasticNet
Decision Tree Regressor
Random Forest Regressor
Gradient Boosting Regressor
Support Vector Regressor (SVR)
Trains each model on the training data and predicts on the test data.
Calculates and prints R² score and mean squared error (MSE) for each model to evaluate
performance.
Visualizes residuals (prediction errors) for all models using scatter plots, helping to identify bias or
variance issues.
Displays a bar chart comparing R² scores of all models, making it easy to see which performs best.
Prints a preview of the dataset’s first few rows along with numerical evaluation results.
Program Code:-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
data = fetch_california_housing()
X = data.data
y = data.target
feature_names = data.feature_names
df = pd.DataFrame(X, columns=feature_names)
df["Target"] = y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
models = {
"Linear Regression": LinearRegression(),
"Ridge Regression": Ridge(),
"Lasso Regression": Lasso(),
"ElasticNet": ElasticNet(),
"Decision Tree": DecisionTreeRegressor(),
"Random Forest": RandomForestRegressor(),
"Gradient Boosting": GradientBoostingRegressor(),
"Support Vector Regressor": SVR()
results = []
plt.figure(figsize=(15, 10))
for i, (name, model) in enumerate(models.items(), 1):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
results.append((name, r2, mse))
plt.subplot(2, 4, i)
residuals = y_test - y_pred
sns.scatterplot(x=y_pred, y=residuals)
plt.axhline(0, color='red', linestyle='--')
plt.xlabel("Predicted")
plt.ylabel("Residuals")
plt.title(f"{name}\nR2: {r2:.3f}, MSE: {mse:.3f}")
plt.tight_layout()
plt.show()
results_df = pd.DataFrame(results, columns=["Model", "R2 Score", "MSE"]).sort_values(by="R2
Score", ascending=False)
plt.figure(figsize=(10, 5))
sns.barplot(x="Model", y="R2 Score", data=results_df, palette="magma")
plt.title("Regression Model Comparison (R2 Score)")
plt.xticks(rotation=45)
plt.ylim(0, 1)
plt.show()
print("Sample Input Data:\n", df.head(), "\n")
for name, model in models.items():
y_pred = model.predict(X_test)
print(f"{name}:")
print(f" R2 Score: {r2_score(y_test, y_pred):.4f}")
print(f" Mean Squared Error: {mean_squared_error(y_test, y_pred):.4f}\n")
Output:-
21. Write a Machine Learning Programs to implement Sparse Modeling and Estimation
- use sparse matrix and then use any medical data to evaluate your model.
Description:- Dataset
Uses the Breast Cancer Wisconsin medical dataset for binary classification (benign vs
malignant).
Sparse Matrix Conversion
Converts the feature data into a sparse matrix format (csr_matrix) for memory-efficient
handling.
Data Splitting
Splits the dataset into training (70%) and testing (30%) subsets.
Scaling
Applies scaling to sparse data using StandardScaler with with_mean=False to preserve
sparsity.
Model Training
Trains a Logistic Regression model that supports sparse input data.
Prediction
Predicts labels on the test dataset using the trained model.
Accuracy Evaluation
Calculates accuracy score to measure overall correctness of the model.
Program Code:-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import sparse
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names
labels = data.target_names
df = pd.DataFrame(X, columns=feature_names)
df['Target'] = y
# Convert to sparse matrix format (CSR)
X_sparse = sparse.csr_matrix(X)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X_sparse, y, test_size=0.3, random_state=42)
# Scaling sparse data - scale dense then convert back sparse
scaler = StandardScaler(with_mean=False) # with_mean=False because sparse matrices can't be
centered
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Logistic Regression supports sparse input
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title(f"Confusion Matrix\nAccuracy: {acc:.4f}")
plt.show()
print("Sample Input Data:\n", df.head(), "\n")
print(f"Accuracy: {acc:.4f}")
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=labels))
Output:-
22. Write a Machine Learning Program to implement Modeling Sequence / Time-Series
Data - ARIMA model.
Description:- Dataset: Monthly airline passenger numbers from 1949 to 1960.
Data Split: 80% training, 20% testing for model validation.
ARIMA Model:
ARIMA(p,d,q) where
o p = number of autoregressive terms (lags), here 5
o d = degree of differencing, here 1 (to make the series stationary)
o q = number of moving average terms, here 0
Training: Fits the ARIMA model to training data.
Forecasting: Predicts passenger numbers for the test period.
Evaluation: Calculates mean squared error (MSE) between actual and predicted values.
Visualization: Plots training data, test data, and ARIMA forecast for visual comparison.
Program Code:-
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
# Load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
data = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
# Plot original time series
plt.figure(figsize=(10,4))
plt.plot(data, label='Original Data')
plt.title('Original Time Series Data')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.legend()
plt.show()
# Train-test split (80-20)
train_size = int(len(data)*0.8)
train, test = data[:train_size], data[train_size:]
plt.figure(figsize=(10,4))
plt.plot(train, label='Training Data')
plt.plot(test, label='Test Data')
plt.title('Train and Test Split')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.legend()
plt.show()
# Fit ARIMA model (order p=5, d=1, q=0)
model = ARIMA(train, order=(5,1,0))
model_fit = model.fit()
# Forecast
forecast = model_fit.forecast(steps=len(test))
# Plot forecast vs actual test data
plt.figure(figsize=(10,4))
plt.plot(test, label='Actual Test Data')
plt.plot(forecast, label='ARIMA Forecast')
plt.title('ARIMA Model Forecast vs Actual')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.legend()
plt.show()
# Calculate and print Mean Squared Error
mse = mean_squared_error(test, forecast)
print(f'Mean Squared Error: {mse:.3f}')
residuals = test.values.flatten() - forecast.values.flatten()
plt.figure(figsize=(10,4))
plt.plot(test.index, residuals)
plt.axhline(0, linestyle='--', color='red')
plt.title('Residuals (Actual - Forecast)')
plt.xlabel('Date')
plt.ylabel('Residuals')
plt.show()
Output:-
23. Write a Machine Learning Program to implement Deep Learning Models -
implement CNN, RNN.
Description:- Uses the MNIST handwritten digit dataset.
Displays sample images and their labels.
Preprocesses data by normalizing and one-hot encoding.
Builds and trains a CNN for spatial feature extraction.
Builds and trains an RNN treating image rows as sequences.
Evaluates and prints accuracy for both models.
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, SimpleRNN
# Load MNIST data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Show sample labels and images
print("Sample labels:", y_train[:10])
plt.figure(figsize=(10,2))
for i in range(10):
plt.subplot(1, 10, i+1)
plt.imshow(X_train[i], cmap='gray')
plt.axis('off')
plt.title(y_train[i])
plt.show()
# Normalize data
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
# One-hot encode labels
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)
### CNN Model ###
X_train_cnn = X_train.reshape(-1, 28, 28, 1)
X_test_cnn = X_test.reshape(-1, 28, 28, 1)
cnn = Sequential([
Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)),
MaxPooling2D(pool_size=(2,2)),
Conv2D(64, kernel_size=(3,3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
cnn.fit(X_train_cnn, y_train_cat, epochs=5, batch_size=128, validation_split=0.2)
cnn_loss, cnn_acc = cnn.evaluate(X_test_cnn, y_test_cat)
print(f'CNN Test Accuracy: {cnn_acc:.4f}')
### RNN Model ###
X_train_rnn = X_train
X_test_rnn = X_test
rnn = Sequential([
SimpleRNN(128, activation='relu', input_shape=(28, 28)),
Dense(10, activation='softmax')
])
rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
rnn.fit(X_train_rnn, y_train_cat, epochs=5, batch_size=128, validation_split=0.2)
rnn_loss, rnn_acc = rnn.evaluate(X_test_rnn, y_test_cat)
print(f'RNN Test Accuracy: {rnn_acc:.4f}')
Output:-
24. Write a Machine learning Program to implement Feature Representation Learning -
use any feature learning model for implementation.
Description:-
📘 Dataset
Uses the MNIST dataset of handwritten digits.
Images are 28x28 pixels, flattened to 784-dimensional vectors for input to the model.
⚙️Preprocessing
Pixel values are normalized to range [0, 1].
Input images are reshaped from (28,28) to (784,) for fully connected layers.
🧠 Autoencoder Model
A type of unsupervised neural network used for feature extraction.
Encoder: Compresses input to a 32-dimensional feature representation using fully connected
layers.
Decoder: Reconstructs the original input from the compressed representation.
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.utils import plot_model
# Load dataset
(X_train, _), (X_test, _) = mnist.load_data()
X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.
# Flatten the data
X_train = X_train.reshape((len(X_train), 28*28))
X_test = X_test.reshape((len(X_test), 28*28))
# Define Autoencoder architecture
input_img = Input(shape=(784,))
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(64, activation='relu')(encoded)
encoded = Dense(32, activation='relu')(encoded)
decoded = Dense(64, activation='relu')(encoded)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)
# Build Model
autoencoder = Model(input_img, decoded)
encoder = Model(input_img, encoded)
# Compile and Train
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(X_train, X_train,
epochs=10,
batch_size=256,
shuffle=True,
validation_data=(X_test, X_test))
# Encode test data to get learned features
encoded_imgs = encoder.predict(X_test)
# Visualize some encoded features
plt.figure(figsize=(8, 4))
for i in range(10):
plt.subplot(2, 10, i+1)
plt.imshow(X_test[i].reshape(28, 28), cmap='gray')
plt.axis('off')
plt.subplot(2, 10, i+11)
plt.imshow(encoded_imgs[i].reshape(4, 8), cmap='viridis')
plt.axis('off')
plt.suptitle('Top: Original Images | Bottom: Feature Representations')
plt.tight_layout()
plt.show()
Output:-
25. Write a Machine Learning program to implement Semi-supervised learning.
Description:-
📊 Dataset
Uses the Digits dataset from sklearn, which contains 8x8 grayscale images of handwritten
digits (0–9).
Each sample is a 64-dimensional feature vector.
🧪 Semi-Supervised Setup
Simulates a real-world scenario by labeling only 5% of the training data.
The remaining 95% are treated as unlabeled using -1 to denote unknown labels.
🔄 Model: Label Spreading
Applies Label Spreading, a graph-based semi-supervised algorithm that propagates labels
from the few labeled samples to the unlabeled ones using feature similarity.
Uses K-Nearest Neighbors (KNN) as the kernel for label spreading.
🧮 Evaluation
Predicts labels on the test set using the trained model.
Measures performance with:
o Accuracy Score
o Classification Report (precision, recall, F1-score)
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.semi_supervised import LabelSpreading
from sklearn.metrics import classification_report, accuracy_score
from sklearn.decomposition import PCA
digits = load_digits()
X = digits.data
y = digits.target
print("Sample features:\n", X[:3])
print("Sample labels:\n", y[:3])
plt.figure(figsize=(6,2))
for i in range(6):
plt.subplot(1, 6, i+1)
plt.imshow(digits.images[i], cmap='gray')
plt.title(f"Label: {y[i]}")
plt.axis('off')
plt.suptitle("Sample Digit Images")
plt.tight_layout()
plt.show()
X_train, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
rng = np.random.RandomState(42)
y_train = np.copy(y_train_full)
mask = rng.rand(len(y_train)) < 0.95
y_train[mask] = -1
model = LabelSpreading(kernel='knn', n_neighbors=7)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print("\nAccuracy on Test Data: {:.4f}".format(acc))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
pca = PCA(n_components=2)
X_2D = pca.fit_transform(X_test)
plt.figure(figsize=(6,5))
scatter = plt.scatter(X_2D[:,0], X_2D[:,1], c=y_pred, cmap='tab10', s=25)
plt.legend(*scatter.legend_elements(), title="Digits")
plt.title("Label Spreading Predictions (PCA View)")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.grid(True)
plt.show()
Output:-
26. Write a Machine Learning Program to implement Active Learning.
Program Code:-
import numpy as np
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
digits = load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
initial_labels = 10
query_batch = 10
max_queries = 100
rng = np.random.RandomState(42)
n_samples = len(X_train)
indices = np.arange(n_samples)
rng.shuffle(indices)
X_pool = X_train[indices]
y_pool = y_train_full[indices]
X_labeled = X_pool[:initial_labels]
y_labeled = y_pool[:initial_labels]
X_unlabeled = X_pool[initial_labels:]
y_unlabeled = y_pool[initial_labels:]
model = LogisticRegression(max_iter=1000)
accuracies = []
for i in range(0, max_queries, query_batch):
model.fit(X_labeled, y_labeled)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
accuracies.append(acc)
if len(X_unlabeled) == 0:
break
probs = model.predict_proba(X_unlabeled)
uncertainty = 1 - np.max(probs, axis=1)
query_idx = np.argsort(uncertainty)[-query_batch:]
X_labeled = np.concatenate((X_labeled, X_unlabeled[query_idx]))
y_labeled = np.concatenate((y_labeled, y_unlabeled[query_idx]))
X_unlabeled = np.delete(X_unlabeled, query_idx, axis=0)
y_unlabeled = np.delete(y_unlabeled, query_idx, axis=0)
print("\nFinal Classification Report:\n", classification_report(y_test, y_pred))
print("\nFinal Accuracy: {:.4f}".format(accuracies[-1]))
plt.plot(range(initial_labels, initial_labels + len(accuracies) * query_batch, query_batch), accuracies,
marker='o')
plt.title("Active Learning Accuracy Over Queries")
plt.xlabel("Number of Labeled Samples")
plt.ylabel("Accuracy")
plt.grid(True)
plt.show()
Output:-
27. Write a Machine Learning Program to implement Reinforcement Learning.
Description:- 🔁 Environment:
Uses FrozenLake-v1, a classic RL environment with discrete states and actions.
The agent learns to reach a goal while avoiding holes.
🧠 Algorithm:
Implements Q-Learning, a value-based reinforcement learning algorithm.
Uses an epsilon-greedy strategy for exploration vs. exploitation.
🧪 Training:
Runs for 1000 episodes.
Program Code:-
import numpy as np
import gym
import random
import matplotlib.pyplot as plt
env = gym.make("FrozenLake-v1", is_slippery=False)
state_size = env.observation_space.n
action_size = env.action_space.n
q_table = np.zeros((state_size, action_size))
total_episodes = 1000
learning_rate = 0.8
gamma = 0.95
epsilon = 1.0
max_epsilon = 1.0
min_epsilon = 0.01
decay_rate = 0.005
rewards = []
for episode in range(total_episodes):
state = env.reset()[0]
total_rewards = 0
done = False
while not done:
exp_exp_tradeoff = random.uniform(0, 1)
if exp_exp_tradeoff > epsilon:
action = np.argmax(q_table[state, :])
else:
action = env.action_space.sample()
new_state, reward, done, _, _ = env.step(action)
q_table[state, action] = q_table[state, action] + learning_rate * (
reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action]
total_rewards += reward
state = new_state
epsilon = min_epsilon + (max_epsilon - min_epsilon) * np.exp(-decay_rate * episode)
rewards.append(total_rewards)
print("Sample of Learned Q-table (first 5 states):\n")
print(np.round(q_table[:5], 3))
print("\nAverage Reward over 1000 episodes:", np.mean(rewards))
plt.plot(range(len(rewards)), rewards, alpha=0.5, label='Reward per Episode')
plt.title("Reward per Episode")
plt.xlabel("Episode")
plt.ylabel("Reward")
plt.grid(True)
plt.show()
Output:-
28. Write a Machine Learning Program to implement Online Learning.
Program Code:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
digits = load_digits()
X, y = digits.data, digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
classes = np.unique(y)
clf = SGDClassifier(loss="log_loss", max_iter=1, learning_rate='optimal', tol=None)
batch_size = 100
train_size = X_train.shape[0]
num_batches = train_size // batch_size
accuracies = []
for i in range(num_batches):
start = i * batch_size
end = start + batch_size
X_batch = X_train[start:end]
y_batch = y_train[start:end]
clf.partial_fit(X_batch, y_batch, classes=classes)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
accuracies.append(acc)
print("Sample batch input data (first batch):\n", X_train[:3])
print("Sample batch labels (first batch):\n", y_train[:3])
print("\nFinal Accuracy after Online Learning: {:.4f}".format(accuracies[-1]))
plt.plot(range(1, num_batches + 1), accuracies, marker='o')
plt.title("Online Learning: Accuracy Over Batches")
plt.xlabel("Batch Number")
plt.ylabel("Accuracy")
plt.grid(True)
plt.show()
Ouput:-
29. Write a Machine Learning Program to implement Distributed Learning
Program code:-
import numpy as np
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,
ConfusionMatrixDisplay
from joblib import Parallel, delayed
import matplotlib.pyplot as plt
digits = load_digits()
X, y = digits.data, digits.target
print("Sample input data:\n", X[:3])
print("Sample labels:\n", y[:3])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
def train_model(X_part, y_part):
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_part, y_part)
return clf
n_chunks = 4
X_splits = np.array_split(X_train, n_chunks)
y_splits = np.array_split(y_train, n_chunks)
models = Parallel(n_jobs=n_chunks)(delayed(train_model)(X_splits[i], y_splits[i]) for i in
range(n_chunks))
probs = [model.predict_proba(X_test) for model in models]
avg_probs = np.mean(probs, axis=0)
y_pred = np.argmax(avg_probs, axis=1)
acc = accuracy_score(y_test, y_pred)
print("\nFinal Accuracy: {:.4f}".format(acc))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(y))
fig, ax = plt.subplots(figsize=(8, 8))
disp.plot(ax=ax, cmap='Blues')
plt.title("Confusion Matrix - Distributed Random Forest")
plt.grid(False)
plt.show()
Output:-