0% found this document useful (0 votes)

41 views42 pages

1 - Lab Manual (ML)

The document is a lab manual for a Machine Learning course at Nims University Rajasthan, detailing practical experiments and objectives. It includes eight practicals focusing on various machine learning techniques such as linear regression, k-Nearest Neighbors, decision trees, and clustering. Each practical outlines the aim, methods, and evaluation metrics for implementing machine learning algorithms on real-world datasets.

Uploaded by

yshivani844

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views42 pages

1 - Lab Manual (ML)

Uploaded by

yshivani844

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Nims Institute of Engineering & Technology

Nims University Rajasthan, Jaipur

LAB MANUAL

FOR
Machine Learning (CSC601B)

NIMS UNIVERSITY RAJASTHAN, JAIPUR

Jaipur-Delhi Highway
Jaipur - 303121, Rajasthan, India
Website: www.nimsuniversity.org

1
Contents
Syllabus ................................................................................................................................................... 3
MACHINE LEARNING LAB ........................................................................................................................ 3
Practical 1 ................................................................................................................................................ 4
Aim: 1. Predict housing prices based on features like area, number of bedrooms, and location using
linear regression. .................................................................................................................................... 4
Practical 2 .............................................................................................................................................. 16
Aim: 2. Classify a dataset using a k-Nearest Neighbors (kNN) classifier ............................................... 16
Practical 3 .............................................................................................................................................. 22
Aim: 3. Implement a decision tree algorithm to classify email spam based on keywords and sender
information. .......................................................................................................................................... 22
Practical 4 .............................................................................................................................................. 25
Aim: 4. Cluster customer data based on purchase history using k-means clustering. ......................... 25
Practical 5 .............................................................................................................................................. 28
Aim: 5. Predict future stock prices using a time series forecasting model (e.g., ARIMA) .................... 28
Practical 6 .............................................................................................................................................. 32
Aim: 6. Develop a sentiment analysis model to classify movie reviews as positive, negative, or
neutral. .................................................................................................................................................. 32
Practical 7 .............................................................................................................................................. 36
Aim: 7. Explore dimensionality reduction techniques like Principal Component Analysis (PCA) to
visualize high-dimensional data. ........................................................................................................... 36
Practical 8 .............................................................................................................................................. 39
Aim: 8. Train a support vector machine (SVM) to classify data. ........................................................... 39

2
Syllabus

MACHINE LEARNING LAB

Course Objectives:

1. Understand the concept of learning in computer and science.

2. Compare and contrast different paradigms for learning (supervised, unsupervised, etc.).
3. Design experiments to evaluate and compare different machine learning techniques on
real-world problems.

Experiments

1. Predict housing prices based on features like area, number of bedrooms, and
location using linear regression.
2. Classify a dataset using a k-Nearest Neighbors (kNN) classifier.
3. Implement a decision tree algorithm to classify email spam based on keywords
and sender information.
4. Cluster customer data based on purchase history using k-means clustering.
5. Predict future stock prices using a time series forecasting model (e.g., ARIMA).
6. Develop a sentiment analysis model to classify movie reviews as positive,
negative, or neutral.
7. Explore dimensionality reduction techniques like Principal Component Analysis
(PCA) to visualize high-dimensional data.
8. Train a support vector machine (SVM) to classify data.

Course Outcomes:
1. Implement and analyse existing learning algorithms, including well-studied methods for
classification, regression and clustering
2. Apply evaluation metrics for various algorithms.
3. Identifying and implementing real-world problem

3
Practical 1

Aim: 1. Predict housing prices based on features like area, number of

bedrooms, and location using linear regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Create a small dataset

data = {
"Area": [1500, 1800, 2400, 3000, 3500],
"Bedrooms": [3, 4, 3, 5, 4],
"Location": [1, 2, 1, 3, 2], # Encoding for location (e.g., 1: City A, 2: City B, 3: City C)
"Price": [300000, 400000, 350000, 500000, 450000]
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Features and target

X = df[["Area", "Bedrooms", "Location"]]
y = df["Price"]

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Linear Regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Model Coefficients:", model.coef_)

print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mse)
print("R2 Score:", r2)

# Visualize actual vs predicted

plt.scatter(range(len(y_test)), y_test, color="blue", label="Actual Prices")

4
plt.scatter(range(len(y_pred)), y_pred, color="red", label="Predicted Prices")
plt.xlabel("Sample Index")
plt.ylabel("Price")
plt.legend()
plt.title("Actual vs Predicted Prices")
plt.show()

Output:

How the Code Works

Dataset: A small dataset with features: Area, Bedrooms, Location, and Price.
Location is encoded as integers.
Splitting Data: Divides the data into training (80%) and testing (20%) sets.
Linear Regression: Uses Linear Regression from scikit-learn to build the model.
Evaluation: Computes Mean Squared Error (MSE) and R² score to evaluate the model's
performance.
Visualization: Compares actual vs. predicted prices.

5
Reading Data from an Excel File instead of Data Frame

Firstly, create a dataset and save it as 1_Housing.xlsx

Upload this file in the Google Drive

# Load data from an Excel file

data_path = "1_Housing.xlsx" # Replace with your Excel file path
df = pd.read_excel(data_path)

6
# Features and target
X = df[["Area", "Bedrooms", "Location"]]
y = df["Price"]

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Linear Regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Model Coefficients:", model.coef_)

print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mse)
print("R2 Score:", r2)

# Visualize actual vs predicted

plt.scatter(range(len(y_test)), y_test, color="blue", label="Actual Prices")
plt.scatter(range(len(y_pred)), y_pred, color="red", label="Predicted Prices")
plt.xlabel("Sample Index")
plt.ylabel("Price")
plt.legend()
plt.title("Actual vs Predicted Prices")

7
plt.show()
Output:

Reading a data file from excel and then predicting the Price on the basis of Area, No. of
Rooms and City.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset from an Excel file

df = pd.read_excel('1_Housing.xlsx') # Replace 'housing_data.xlsx' with your file path

8
# Features and target
X = df[["Area", "Bedrooms", "Location"]]
y = df["Price"]

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Linear Regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Model Coefficients:", model.coef_)

print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mse)
print("R2 Score:", r2)

# Visualize actual vs predicted

9
plt.title("Actual vs Predicted Prices")
plt.show()

# Predict price for new data

new_data = pd.DataFrame({
"Area": [float(input("Enter Area: "))],
"Bedrooms": [int(input("Enter Bedrooms: "))],
"Location": [int(input("Enter Location (e.g., 1 for City A, 2 for City B): "))]
})

predicted_price = model.predict(new_data)
print(f"Predicted Price: {predicted_price[0]}")

10
11
12
13
14
15
Practical 2

Aim: 2. Classify a dataset using a k-Nearest Neighbors (kNN) classifier

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset

data = load_iris()

X = data.data

y = data.target

# Split dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize data

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Train kNN classifier

k=3

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

# Make predictions

y_pred = knn.predict(X_test)

16
# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))

Output:

Add some Visualization to the code:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset

17
data = load_iris()

X = data.data

y = data.target

feature_names = data.feature_names

class_names = data.target_names

# Split dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize data

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Train kNN classifier

k=3

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

# Make predictions

y_pred = knn.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))

# Visualizations

# 1. Pairplot of features

df = pd.DataFrame(X, columns=feature_names)

18
df['target'] = y

sns.pairplot(df, hue='target', diag_kind='hist', palette='Set2')

plt.suptitle('Feature Distributions and Pairwise Scatter Plots', y=1.02)

plt.show()

# 2. Confusion Matrix Heatmap

conf_matrix = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(6, 5))

sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=class_names,

yticklabels=class_names)

plt.title('Confusion Matrix')

plt.xlabel('Predicted Labels')

plt.ylabel('True Labels')

plt.show()

# 3. Decision Boundaries (for 2 features only, e.g., first two features)

if X.shape[1] == 2: # Only possible for datasets with 2 features

X_plot = X[:, :2] # Use the first two features

X_train_plot, X_test_plot, y_train_plot, y_test_plot = train_test_split(X_plot, y, test_size=0.2,

random_state=42)

X_train_plot = scaler.fit_transform(X_train_plot)

X_test_plot = scaler.transform(X_test_plot)

knn_2d = KNeighborsClassifier(n_neighbors=k)

knn_2d.fit(X_train_plot, y_train_plot)

# Create a mesh grid

x_min, x_max = X_train_plot[:, 0].min() - 1, X_train_plot[:, 0].max() + 1

y_min, y_max = X_train_plot[:, 1].min() - 1, X_train_plot[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),

np.arange(y_min, y_max, 0.01))

19
Z = knn_2d.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)

# Plot decision boundary

plt.figure(figsize=(8, 6))

plt.contourf(xx, yy, Z, alpha=0.8, cmap='Set3')

scatter = plt.scatter(X_test_plot[:, 0], X_test_plot[:, 1], c=y_test_plot, edgecolor='k', cmap='viridis')

plt.title('kNN Decision Boundary (2D)')

plt.xlabel(feature_names[0])

plt.ylabel(feature_names[1])

legend = plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

plt.show()

else:

print("Decision boundary visualization is only possible for datasets with 2 features.")

Output:

20
How kNN works?

21
Practical 3

Aim: 3. Implement a decision tree algorithm to classify email spam

based on keywords and sender information.

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, export_text

from sklearn.metrics import accuracy_score, classification_report

# Sample dataset

data = {

'contains_free': [1, 0, 1, 0, 1, 0, 1, 0],

'contains_offer': [0, 1, 1, 0, 0, 1, 1, 0],

'sender_known': [0, 1, 1, 1, 0, 1, 0, 0],

'spam': [1, 0, 1, 0, 1, 0, 1, 0]

# Create a DataFrame

df = pd.DataFrame(data)

# Features and target

X = df[['contains_free', 'contains_offer', 'sender_known']]

y = df['spam']

# Split dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

22
# Initialize the Decision Tree Classifier

clf = DecisionTreeClassifier(random_state=42)

# Train the model

clf.fit(X_train, y_train)

# Make predictions

y_pred = clf.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

print("\nClassification Report:")

print(classification_report(y_test, y_pred))

# Display the decision tree

tree_rules = export_text(clf, feature_names=list(X.columns))

print("\nDecision Tree Rules:")

print(tree_rules)

Output:

23
Explanation of the Code

1. Dataset:
The dataset contains three features:
o contains_free: Indicates if the email contains the keyword "free" (1 for yes,
0 for no).
o contains_offer: Indicates if the email contains the keyword "offer" (1 for
yes, 0 for no).
o sender_known: Indicates if the sender is known (1 for yes, 0 for no).
The spam column is the target (1 for spam, 0 for not spam).
2. Splitting the Dataset:
The dataset is split into training and testing sets using train_test_split.
3. Model Training:
A decision tree classifier is initialized and trained on the training set.
4. Model Evaluation:
Predictions are made on the testing set, and the accuracy and classification report are
printed.
5. Visualizing the Tree:
The export_text function generates human-readable decision rules for the tree.

Output Example
sql
Copy code
Accuracy: 100.00%

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 2

1 1.00 1.00 1.00 2

accuracy 1.00 4
macro avg 1.00 1.00 1.00 4
weighted avg 1.00 1.00 1.00 4

Decision Tree Rules:

1. Customization: You can replace the sample dataset with a larger and more realistic
email dataset.
2. Feature Engineering: Add more features like the length of the email, frequency of
certain words, etc.
3. Model Tuning: Adjust the parameters of DecisionTreeClassifier (e.g.,
max_depth, min_samples_split) for better performance.

24
Practical 4

Aim: 4. Cluster customer data based on purchase history using k-means

clustering.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Simulated customer data

data = {
'customer_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'total_spent': [500, 1500, 300, 800, 2500, 200, 1000, 1800, 400, 700],
'frequency': [5, 20, 2, 8, 30, 1, 12, 25, 3, 6],
'average_purchase_value': [100, 75, 150, 100, 83, 200, 83, 72, 133, 117],
}

# Create a DataFrame
df = pd.DataFrame(data)

# Features for clustering

features = df[['total_spent', 'frequency', 'average_purchase_value']]

# Standardize the features

scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

# Apply KMeans clustering

kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(scaled_features)

# Add cluster labels to the DataFrame

df['cluster'] = clusters

# Visualize the clusters using PCA (2D projection)

pca = PCA(n_components=2)
pca_features = pca.fit_transform(scaled_features)

plt.figure(figsize=(8, 6))
for cluster in np.unique(clusters):
plt.scatter(
pca_features[clusters == cluster, 0],
pca_features[clusters == cluster, 1],
label=f'Cluster {cluster}'
)

25
# Add centroids to the plot
centroids = pca.transform(kmeans.cluster_centers_)
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='black', marker='X', label='Centroids')

plt.title('Customer Clusters')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.legend()
plt.grid()
plt.show()

# Print clustered data

print("Clustered Customer Data:")
print(df)

Output:

26
Key Components:

1. Dataset:
o total_spent: Total amount spent by a customer.
o frequency: Number of purchases.
o average_purchase_value: Average value of each purchase.
2. Standardization:
o Used StandardScaler to normalize features for better clustering
performance.
3. K-Means:
o Specified n_clusters=3 (can be optimized using the elbow method or
silhouette score).
4. Visualization:
o Used PCA for 2D visualization of high-dimensional data.
o Plotted clusters with their centroids.

Notes:

1. Elbow Method: Use the elbow method to determine the optimal number of clusters
by plotting the inertia for different cluster counts.
2. Feature Engineering: Add more features like customer lifetime value, recency, etc.,
for better clustering.
3. Real Data: Replace the simulated data with real customer data.

27
Practical 5

Aim: 5. Predict future stock prices using a time series forecasting model
(e.g., ARIMA)
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt

# Step 1: Create a small dataset for stock prices

data = {
"Date": pd.date_range(start="2024-01-01", periods=10, freq="D"),
"Stock_Price": [150 + i * 2 + np.random.uniform(-3, 3) for i in
range(10)] # Simulated stock prices with noise
}
stock_data = pd.DataFrame(data)
stock_data.set_index("Date", inplace=True)

# Step 2: Fit an ARIMA model

stock_prices = stock_data['Stock_Price']
model = ARIMA(stock_prices, order=(1, 1, 1)) # ARIMA parameters (p, d,
q)
fitted_model = model.fit()

# Step 3: Forecast future stock prices

forecast_steps = 5
forecast = fitted_model.forecast(steps=forecast_steps)

# Step 4: Plot the actual and forecasted stock prices

plt.figure(figsize=(10, 6))
plt.plot(stock_prices, label="Actual Stock Prices", marker="o")
forecast_index = pd.date_range(start=stock_data.index[-1] +
pd.Timedelta(days=1), periods=forecast_steps, freq='D')
plt.plot(forecast_index, forecast, label="Forecasted Stock Prices",
marker="x", linestyle="--", color="red")
plt.xlabel("Date")
plt.ylabel("Stock Price")
plt.title("Stock Price Forecast using ARIMA")
plt.legend()
plt.grid()
plt.show()

# Step 5: Display the forecasted values

print("Forecasted Stock Prices:")
forecast_df = pd.DataFrame({"Date": forecast_index, "Forecasted Price":
forecast.values})

28
print(forecast_df)

Output:

Here’s a step-by-step explanation of the code:

Step 1: Create a Small Dataset for Stock Prices

python
Copy code
data = {
"Date": pd.date_range(start="2024-01-01", periods=10, freq="D"),
"Stock_Price": [150 + i * 2 + np.random.uniform(-3, 3) for i in
range(10)] # Simulated stock prices with noise
}
stock_data = pd.DataFrame(data)
stock_data.set_index("Date", inplace=True)

1. pd.date_range: Creates a range of 10 consecutive dates starting from "2024-01-01".

2. Stock_Price formula: Simulates prices starting at 150 and incrementing by 2 per day,
with random noise added using np.random.uniform(-3, 3).
3. pd.DataFrame: Stores the generated data in a pandas DataFrame.
4. set_index: Sets the Date column as the index, which is essential for time series
analysis.

29
Step 2: Fit an ARIMA Model
python
Copy code
stock_prices = stock_data['Stock_Price']
model = ARIMA(stock_prices, order=(1, 1, 1)) # ARIMA parameters (p, d, q)
fitted_model = model.fit()

1. ARIMA: A popular time series forecasting model with three parameters:

o p: Autoregressive order (how past values influence current values).
o d: Degree of differencing (removes trends in the data).
o q: Moving average order (models residuals/errors).
2. Fit the model: The fit() method trains the ARIMA model on the dataset.

Step 3: Forecast Future Stock Prices

python
Copy code
forecast_steps = 5
forecast = fitted_model.forecast(steps=forecast_steps)

1. forecast_steps: Specifies the number of future days to predict (5 in this case).

2. fitted_model.forecast: Generates the predicted values for the specified number of
steps.

Step 4: Plot Actual and Forecasted Stock Prices

python
Copy code
plt.figure(figsize=(10, 6))
plt.plot(stock_prices, label="Actual Stock Prices", marker="o")
forecast_index = pd.date_range(start=stock_data.index[-1] +
pd.Timedelta(days=1), periods=forecast_steps, freq='D')
plt.plot(forecast_index, forecast, label="Forecasted Stock Prices",
marker="x", linestyle="--", color="red")
plt.xlabel("Date")
plt.ylabel("Stock Price")
plt.title("Stock Price Forecast using ARIMA")
plt.legend()
plt.grid()
plt.show()

1. Plot actual data: plt.plot displays the historical stock prices (stock_prices).
2. Create forecast dates: pd.date_range generates future dates starting from the day
after the last date in the dataset.
3. Plot forecast: Plots the predicted stock prices on the same graph.
4. Styling: Adds labels, title, legend, and grid for better visualization.

Step 5: Display Forecasted Values

python
Copy code

30
forecast_df = pd.DataFrame({"Date": forecast_index, "Forecasted Price":
forecast.values})
print(forecast_df)

1. Create DataFrame: Combines the forecasted dates and predicted prices into a new
DataFrame.
2. Print results: Displays the forecasted values for better clarity.

Output

• Graph: Shows both the actual stock prices and the forecasted values with clear
markers.
• Table: Displays the forecasted stock prices in tabular form.

31
Practical 6

Aim: 6. Develop a sentiment analysis model to classify movie reviews

as positive, negative, or neutral.

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.pipeline import Pipeline

from sklearn.metrics import classification_report

# Step 1: Create a small dataset of movie reviews

data = {

"Review": [

"The movie was fantastic! I loved the characters and the plot.",

"What a terrible movie. It was a complete waste of time.",

"The movie was okay, not too good, not too bad.",

"Absolutely loved it! One of the best movies I've seen this year.",

"The plot was predictable, but the acting was decent.",

"Horrible! I couldn't even finish it.",

"It was just fine. Nothing special, nothing terrible.",

"A masterpiece. Beautifully directed and acted.",

"Worst movie ever. Do not watch this.",

"Pretty average. Had some good moments but also some flaws."

"Sentiment": [

"Positive",

32
"Negative",

"Neutral",

"Positive",

"Neutral",

"Negative",

"Neutral",

"Positive",

"Negative",

"Neutral"

# Convert to DataFrame

df = pd.DataFrame(data)

# Step 2: Split the data into training and test sets

X = df['Review']

y = df['Sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Create a pipeline for vectorization and classification

pipeline = Pipeline([

('vectorizer', CountVectorizer()), # Converts text into numerical features

('classifier', MultinomialNB()) # Naive Bayes classifier

])

# Step 4: Train the model

33
pipeline.fit(X_train, y_train)

# Step 5: Evaluate the model

y_pred = pipeline.predict(X_test)

print("Classification Report:")

print(classification_report(y_test, y_pred))

# Step 6: Test with new reviews

new_reviews = [

"What an amazing film! I would watch it again.",

"It was boring and predictable. Not worth my time.",

"An average movie. Nothing stood out."

predictions = pipeline.predict(new_reviews)

# Display predictions

for review, sentiment in zip(new_reviews, predictions):

print(f"Review: {review}\nPredicted Sentiment: {sentiment}\n")

Output:

34
Explanation

1. Dataset:
o A small dataset with 10 movie reviews labeled as Positive, Negative, or
Neutral.
2. Splitting Data:
o The dataset is split into training (70%) and testing (30%) sets to evaluate the
model's performance.
3. Pipeline:
o CountVectorizer: Converts text into a numerical format using word
frequency.
o MultinomialNB: A Naive Bayes classifier, effective for text classification
tasks.
4. Training:
o The model is trained on the training dataset using the pipeline.
5. Evaluation:
o The model is tested on unseen reviews (test set) using
classification_report.
6. Predictions:
o The trained model predicts sentiments for new reviews.

Output

• Classification Report: Displays precision, recall, and F1 scores.

• Predictions: Shows the predicted sentiment for new reviews.

35
Practical 7

Aim: 7. Explore dimensionality reduction techniques like Principal

Component Analysis (PCA) to visualize high-dimensional data.

import numpy as np

import pandas as pd

from sklearn.decomposition import PCA

from sklearn.datasets import make_classification

import matplotlib.pyplot as plt

import seaborn as sns

# Step 1: Generate high-dimensional data

X, y = make_classification(

n_samples=500, # Number of samples

n_features=10, # Number of features

n_informative=5, # Number of informative features

n_redundant=2, # Number of redundant features

n_classes=3, # Number of classes

random_state=42

# Step 2: Apply PCA

pca = PCA(n_components=2) # Reduce to 2 components for visualization

X_pca = pca.fit_transform(X)

# Step 3: Create a DataFrame for visualization

pca_df = pd.DataFrame(X_pca, columns=["PCA1", "PCA2"])

36
pca_df["Target"] = y

# Step 4: Visualize the PCA results

plt.figure(figsize=(10, 6))

sns.scatterplot(data=pca_df, x="PCA1", y="PCA2", hue="Target", palette="Set2", s=70)

plt.title("PCA Visualization of High-Dimensional Data", fontsize=14)

plt.xlabel("Principal Component 1", fontsize=12)

plt.ylabel("Principal Component 2", fontsize=12)

plt.legend(title="Class")

plt.grid()

plt.show()

# Step 5: Explain variance captured by PCA components

explained_variance = pca.explained_variance_ratio_

print("Explained Variance by Each Component:")

for i, variance in enumerate(explained_variance, 1):

print(f"Component {i}: {variance:.2f}")

Explanation

37
1. Data Generation:
o make_classification creates a synthetic dataset with 10 features and 3
classes.
o n_informative and n_redundant specify the number of informative and
redundant features.
2. PCA Application:
o PCA(n_components=2) reduces the dimensionality to 2 components for easy
visualization.
o fit_transform applies PCA to the data.
3. Visualization:
o Seaborn scatterplot: Displays data points in a 2D space (PCA1 vs. PCA2)
colored by their class labels.
4. Variance Explained:
o explained_variance_ratio_: Provides the proportion of variance explained
by each principal component.

38
Practical 8

Aim: 8. Train a support vector machine (SVM) to classify data.

import numpy as np

import pandas as pd

from sklearn.decomposition import PCA

from sklearn.datasets import make_classification

import matplotlib.pyplot as plt

import seaborn as sns

# Step 1: Generate high-dimensional data

X, y = make_classification(

n_samples=500, # Number of samples

n_features=10, # Number of features

n_informative=5, # Number of informative features

n_redundant=2, # Number of redundant features

n_classes=3, # Number of classes

random_state=42

# Step 2: Apply PCA

pca = PCA(n_components=2) # Reduce to 2 components for visualization

X_pca = pca.fit_transform(X)

# Step 3: Create a DataFrame for visualization

pca_df = pd.DataFrame(X_pca, columns=["PCA1", "PCA2"])

pca_df["Target"] = y

39
# Step 4: Visualize the PCA results

plt.figure(figsize=(10, 6))

sns.scatterplot(data=pca_df, x="PCA1", y="PCA2", hue="Target", palette="Set2", s=70)

plt.title("PCA Visualization of High-Dimensional Data", fontsize=14)

plt.xlabel("Principal Component 1", fontsize=12)

plt.ylabel("Principal Component 2", fontsize=12)

plt.legend(title="Class")

plt.grid()

plt.show()

# Step 5: Explain variance captured by PCA components

explained_variance = pca.explained_variance_ratio_

print("Explained Variance by Each Component:")

for i, variance in enumerate(explained_variance, 1):

print(f"Component {i}: {variance:.2f}")

Output:

40
Explanation
41
1. Dataset Creation:
o make_classification: Generates a synthetic dataset with two classes and
four features.
o 2 features are informative, and the rest are noise.
2. Train-Test Split:
o Splits data into 70% training and 30% testing sets.
3. Training the SVM:
o SVC: Trains a Support Vector Machine with a linear kernel to classify data.
4. Predictions:
o Predictions are made on the testing data.
5. Evaluation:
o classification_report: Displays precision, recall, F1-score, and accuracy.
o Confusion matrix: Provides a visual representation of true and false
predictions.
6. Visualization:
o Decision boundaries are plotted for the first two features to demonstrate how
the SVM separates the classes.

Output

1. Classification Report:
o Precision, recall, F1-score, and accuracy for each class.
2. Confusion Matrix:
o A heatmap displaying the confusion matrix.
3. Decision Boundary Plot:
o Visualization of the decision boundaries and the training/testing points.

Final Lab Manual
No ratings yet
Final Lab Manual
34 pages
ML Record
No ratings yet
ML Record
21 pages
Machine Learning-SEAIML-241P (PR) Bharat
No ratings yet
Machine Learning-SEAIML-241P (PR) Bharat
42 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
Explain Me Every Code Written in It With Deep Know
No ratings yet
Explain Me Every Code Written in It With Deep Know
7 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
ML Manual
No ratings yet
ML Manual
24 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Week-7 DS Practical
No ratings yet
Week-7 DS Practical
8 pages
Machinelearning Project
No ratings yet
Machinelearning Project
3 pages
ML Record
No ratings yet
ML Record
19 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
No ratings yet
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
19 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
Experiment 4 ML
No ratings yet
Experiment 4 ML
9 pages
Data Mining Final Assignment
No ratings yet
Data Mining Final Assignment
4 pages
ML Assignment2 33418
No ratings yet
ML Assignment2 33418
6 pages
R22 ML Lab Manual
No ratings yet
R22 ML Lab Manual
25 pages
IoT Task4 21BEC0384
No ratings yet
IoT Task4 21BEC0384
9 pages
Ds 4 Linears Boston
No ratings yet
Ds 4 Linears Boston
2 pages
ML Report 1
No ratings yet
ML Report 1
23 pages
Linear Regression with Boston Housing Data
No ratings yet
Linear Regression with Boston Housing Data
14 pages
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
No ratings yet
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
4 pages
MY PRO DAY 9 Copy
No ratings yet
MY PRO DAY 9 Copy
59 pages
Phase 5
No ratings yet
Phase 5
5 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Exp4 (Linear Regression)
No ratings yet
Exp4 (Linear Regression)
2 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
EXPNO5
No ratings yet
EXPNO5
2 pages
Predicting House Prices
No ratings yet
Predicting House Prices
9 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
DA Lab2
No ratings yet
DA Lab2
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
ML Lap
No ratings yet
ML Lap
23 pages
ML Full For Print New 1
No ratings yet
ML Full For Print New 1
38 pages
Project
No ratings yet
Project
36 pages
MLA Lab Record (2024)
No ratings yet
MLA Lab Record (2024)
47 pages
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
No ratings yet
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
2 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
Iml Lab (1) .177
No ratings yet
Iml Lab (1) .177
32 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
ML 8,9,10
No ratings yet
ML 8,9,10
3 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Coding Question
No ratings yet
Coding Question
6 pages
L03 The Regression Pipeline - 2
No ratings yet
L03 The Regression Pipeline - 2
58 pages
Act 7
No ratings yet
Act 7
18 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
DL Assignment 1ms24rai03
No ratings yet
DL Assignment 1ms24rai03
10 pages
Easy Pract ML
No ratings yet
Easy Pract ML
7 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Linear
No ratings yet
Linear
2 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
Ai Lab
No ratings yet
Ai Lab
10 pages
Machine Learning Projects Python
94% (18)
Machine Learning Projects Python
134 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (17)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Full Course of Machine Learning
100% (17)
Full Course of Machine Learning
660 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Machine Learning Notes
91% (11)
Machine Learning Notes
19 pages
800 Data Science Questions
100% (2)
800 Data Science Questions
258 pages
Machine Learning From Scratch PDF
89% (9)
Machine Learning From Scratch PDF
124 pages
Let Us Python by Yashavant Kanetkar
89% (27)
Let Us Python by Yashavant Kanetkar
429 pages
Machine Learning Interview Questions.
50% (2)
Machine Learning Interview Questions.
43 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
ML Interview Questions PDF
83% (6)
ML Interview Questions PDF
20 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Top 100 Machine Learning Questions With Answers For Interview PDF
100% (3)
Top 100 Machine Learning Questions With Answers For Interview PDF
48 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
96% (23)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Lecture 17 Clustering
No ratings yet
Lecture 17 Clustering
63 pages
140 Basic Python Programs
75% (12)
140 Basic Python Programs
96 pages
Data Science & Statistics Concepts Explained
100% (1)
Data Science & Statistics Concepts Explained
77 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
Deep Learning With Python
100% (10)
Deep Learning With Python
396 pages
170 Machine Learning Interview Questios - Greatlearning
100% (1)
170 Machine Learning Interview Questios - Greatlearning
57 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (15)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
Understanding Machine Learning
100% (72)
Understanding Machine Learning
416 pages
C Programming Exercises
83% (12)
C Programming Exercises
26 pages
EBOOK - Python Crash Course For Data Analysis
100% (12)
EBOOK - Python Crash Course For Data Analysis
168 pages
Coding Interview Prep Guide
0% (1)
Coding Interview Prep Guide
100 pages
Coding Interview-Leetcode
83% (6)
Coding Interview-Leetcode
181 pages
Competitive Programming in Python 128 Algorithms To Develop Your Coding Skills
100% (10)
Competitive Programming in Python 128 Algorithms To Develop Your Coding Skills
267 pages
Machine Learning With Python
100% (15)
Machine Learning With Python
692 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (10)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Dasaratha Shani Stotra: Dasaratha Shani Stotra in Hindi
0% (1)
Dasaratha Shani Stotra: Dasaratha Shani Stotra in Hindi
8 pages
AS610-AS630 Outdoor Siren Installation Sheet
No ratings yet
AS610-AS630 Outdoor Siren Installation Sheet
6 pages
Andr
No ratings yet
Andr
2 pages
Deck Slab Reinforcement Details: Nagpur Metro Rail Project
No ratings yet
Deck Slab Reinforcement Details: Nagpur Metro Rail Project
1 page
Water Contamiination of Steam Turbine
No ratings yet
Water Contamiination of Steam Turbine
6 pages
Final Pay Processing Guide
No ratings yet
Final Pay Processing Guide
5 pages
Revision Checklist - Foundation - PDF
No ratings yet
Revision Checklist - Foundation - PDF
3 pages
Research
No ratings yet
Research
3 pages
1phase digital meter单相液晶
No ratings yet
1phase digital meter单相液晶
4 pages
3G Neighbor List Planning
100% (1)
3G Neighbor List Planning
3 pages
Heat Sink Capacity Mesurment in Inservice Pipeline
No ratings yet
Heat Sink Capacity Mesurment in Inservice Pipeline
13 pages
Vernier Caliper Worksheet With Example Solution PDF
100% (7)
Vernier Caliper Worksheet With Example Solution PDF
3 pages
How Do You Modify Plate Orientation On STAAD
No ratings yet
How Do You Modify Plate Orientation On STAAD
2 pages
Final Examination Timetable t2 2425
No ratings yet
Final Examination Timetable t2 2425
2 pages
RRD24FR007 - Pigg.02 02 24 - Redacted Rel
No ratings yet
RRD24FR007 - Pigg.02 02 24 - Redacted Rel
35 pages
2008 S4c
No ratings yet
2008 S4c
1 page
1212C Manual en
100% (2)
1212C Manual en
70 pages
Year 9 - First Assessment Portion Oct 2025
No ratings yet
Year 9 - First Assessment Portion Oct 2025
23 pages
Advanced Quantum Mechanics Guide
No ratings yet
Advanced Quantum Mechanics Guide
405 pages
Instruments
No ratings yet
Instruments
31 pages
The Laroche Candy Company Case Study
88% (17)
The Laroche Candy Company Case Study
3 pages
Job Hazard Analysis: Task/Procedure Hazard Safe Procedure
100% (1)
Job Hazard Analysis: Task/Procedure Hazard Safe Procedure
1 page
Gulf Transcrest Synthetic HD ATF
No ratings yet
Gulf Transcrest Synthetic HD ATF
1 page
2 From
No ratings yet
2 From
27 pages
Pcs-Unit-I MCQ
No ratings yet
Pcs-Unit-I MCQ
3 pages
CH - 03 - 1 DataLink Layer
No ratings yet
CH - 03 - 1 DataLink Layer
11 pages
Mercedes Axor 1828 2022 09 29
No ratings yet
Mercedes Axor 1828 2022 09 29
7 pages
26 05 29 00 (16070)
No ratings yet
26 05 29 00 (16070)
6 pages
Qatar Chamber - Form (English)
No ratings yet
Qatar Chamber - Form (English)
3 pages
Tos Lab Report
No ratings yet
Tos Lab Report
15 pages