0% found this document useful (0 votes)

5 views21 pages

ML

Uploaded by

Arjun Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views21 pages

ML

Uploaded by

Arjun Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

A

LAB FILE
ON
MACHINE LEARNING: THEORY & APPLICATIONS
(COURSE CODE: AIML-202)

Submitted By Submitted To:

Name : Raghav Pandey Dr. Dhruv Sharma
Enrollment No. : A2305224360 Assistant Professor
Class & Section : 3CSE5Y. Amity Centre for Artificial intelligence

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY
AMITY UNIVERSITY UTTAR PRADESH, NOIDA
SESSION: ODD SEM. (2025-26)
EXPERIMENT- 01
Importing Data and Data Visualization Task
Aim:- Construct various types of plots/charts like histogram, bar chart, pie chart, and scatter plot by
importing data from a CSV format file. Further label different axes and data in a plot.
Libraries Used:-
 pandas
 matplotlib.pyplot
 random
Theory:-
Pandas is used for reading and handling data from CSV files in a DataFrame format.
Matplotlib’s pyplot module is used to create different visualizations like histograms, bar charts, pie
charts, and scatter plots with customization.
The random module generates random numbers, though it is not used in this program.
Code:-
import random
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('students.csv')
plt.hist(data['Age'])
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()
plt.hist(data['Age'],color='green')
# plt.hist
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('age Distribution')
plt.show()
plt.hist(data['Age'],color='green',bins = 20)
# plt.hist
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()
plt.bar(data['Department'].unique(), data['Department'].value_counts())
plt.xlabel('Department')
plt.ylabel('Count')
plt.title('Department Comparison')

1
plt.show()
plt.figure(figsize=(1,2))
# creating Bar chart with a width size of 0.4
plt.bar(data['Department'].unique(), data['Department'].value_counts(),width = 0.4)
plt.xlabel('Department')
plt.ylabel('Count')
plt.title('Department Comparison')
plt.show()
plt.figure(figsize=(2,1))
# creating Bar chart with a height of 0.4
plt.barh(data['Department'].unique(), data['Department'].value_counts(),height=0.4)
plt.xlabel('Department')
plt.ylabel('Department')
plt.title('Department Comparison')
plt.show()
plt.figure(figsize=(2,1))
# creating Bar chart with a height of 0.4
bars = plt.barh(data['Department'].unique(), data['Department'].value_counts(),height=0.4)
plt.bar_label(bars)
plt.xlabel('Department')
plt.ylabel('Count')
plt.title('Department Comparison')
plt.show()
plt.pie(data['Department'].value_counts(), labels=data['Department'].unique())
plt.title('Department Proportion')
plt.show()
plt.pie(data['Department'].value_counts(), labels=data['Department'].unique(),autopct='%1.1f
%%')
plt.title('Department Proportion')
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'])
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'],color = 'red')
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')

2
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'],c = 'red',edgecolor = 'black')
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'],c = 'red',edgecolor = 'black',s = 100)
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')
plt.show()
labels = df['Age'].tolist()
values = df['GPA'].tolist()
values += values[:1]
angles = np.linspace(0, 2 * np.pi, len(labels), endpoint=False).tolist()
angles += angles[:1]
plt.figure(figsize=(6, 6))
ax = plt.subplot(111, polar=True)
plt.xticks(angles[:-1], labels)
ax.plot(angles, values, color='navy', linewidth=2)
ax.fill(angles, values, color='lightblue', alpha=0.5)
plt.title('GPA AS PER Age RADAR PLOT', size=14,pad=20)
plt.show()

OUTPUT:

Figure 1: Histogram distribution of showing age students

3
Figure 3: Green histogram of student ages with 20 bins for more detail.

Figure 4: Vertical bar chart comparing counts of each department.

Figure 5: Vertical bar chart with adjusted width for each department.

4
Figure 6: Horizontal bar chart comparing departments with reduced bar height.

Figure 7: Horizontal bar chart with labels on each bar.

Figure 8: Pie chart showing department proportions.

5
Figure 9: Pie chart showing department proportions with percentage values.

Figure 10: Scatter plot of GPA vs Graduation Year.

6
Figure 11: Red scatter plot of GPA vs Graduation Year.

Figure 12: Red scatter plot with black edges for each point.

7
Figure 13: Red scatter plot with black edges and larger marker size.

Figure 14: Radar Plot

RESULT:-
Various plots like histogram, bar chart, pie chart, and scatter plot were successfully created
using data from a CSV file. Each plot was properly labeled with titles, axis names, and
customized styles.
8
EXPERIMENT- 02
Data Cleaning and Pre-processing Task
Aim:- Fill the missing values, removing/inserting columns, labelling the output column, feature
scaling, converting the categorical values to numerical values, etc, by importing data from a CSV
format file.
Libraries Used:-
• pandas
• numpy
• sklearn.preprocessing (MinMaxScaler, StandardScaler, RobustScaler)
• sklearn.preprocessing (LabelEncoder)

Theory:-
The dataset is cleaned by handling missing values, dropping irrelevant columns, and adding derived
columns. Categorical values are converted to numbers using label encoding, and numerical features
are scaled for uniformity. One-hot encoding is applied to convert categorical data into binary dummy
variables, enhancing model compatibility.
Code:-
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn.preprocessing import LabelEncoder
# Load the data
df =
pd.read_csv("adult.csv.xls")
#Identify missing values
df.columns
df.isnull().sum()
print(df)
#drop rows with missing values
df.dropna(inplace=True)
print(df)
# Remove irrelevant column
df1 = df.drop('gender', axis=1)
print(df1)
#Insert new column with age squared
df1 = df1.assign(age_squared=lambda x: x['age']**2)
print(df1)
# Rename target variable column
9
df1 = df1.rename(columns={'income': 'annual_income'})

10
print(df1.columns)
# Convert 'annual_income' to numerical using Label Encod
le = LabelEncoder()
df1['annual_income'] = le.fit_transform(df1['annual_income'])
print(df1)
# Step 4: Scale numeric columns
scaler = MinMaxScaler()
df1['age'] =
scaler.fit_transform(df1[['age']]) print(df1)
#Standard scaling
scaler = StandardScaler()
df1[['annual_income', 'age_squared']] = scaler.fit_transform(df1[['annual_income',
'age_squared']])
print(df1)
#Converting Categorical Values to Numerical
Values df1 = pd.get_dummies(df1, columns=['age'])
print(df1)
OUTPUT:-

Figure 1: The Original dataset loaded from adult.csv.xls is displayed.

11
Figure 2: Dataset after removing rows with missing values.

Figure 3: Dataset after removing the gender column.

12
Fig 4: Dataset with a new column age_squared added.

Figure 5: Dataset after renaming the column income to annual_income.

13
Figure 6: Dataset after converting annual_income values to numeric using label encoding.

Figure 7: Dataset after applying Standard scaling to annual_income and age_squared columns.

14
Figure 8: Dataset after converting categorical values of age into dummy variables.

RESULT:-
The data was cleaned and prepared by fixing missing values, changing text to numbers, and scaling
the values.

15
EXPERIMENT- 03
Linear Regression
Aim:- Given an advertising data, build a model to predict the sales based on the money spent on
different platforms for marketing. Implement an ordinary least squares linear regression to test the
accuracy of the model.
Libraries Used:-
• import pandas as pd
• import seaborn as sns
• from sklearn import preprocessing
• from matplotlib import pyplot as plt
• import numpy as np
• from sklearn.model_selection import train_test_split
• from sklearn.linear_model import LinearRegression
• from sklearn.metrics import mean_squared_error
• from sklearn.metrics import r2_score
Theory:-
Linear regression is a technique for modeling the relationship between a dependent variable (Y) and
one independent variable (X). It assumes that there is a linear relationship between X and Y, and
tries to fit a straight line that best describes this relationship.
Code:-
import pandas as pd
import seaborn as sns
from sklearn import preprocessing
from matplotlib import pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
c1 = "advertising.csv"
# load the data into a pandas dataframe
df = pd.read_csv(c1)
df = pd.DataFrame(df)
print(df)
df.shape
df.info()
df.isnull().sum()*100/df.shape[0]
# There are no NULL values in the dataset, hence it is clean.
# Let's see the correlation between different variables.
sns.heatmap(df.corr(),annot = True)
# separate the target variable and the feature
X = df[['TV']]
y = df['Sales']
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.3,random_state = 41)

16
# create a linear regression model
model = LinearRegression()
# fit the model to the training data
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
#Returns the mean squared error; we'll take a square root
np.sqrt(mean_squared_error(y_test, y_pred))
r_squared = r2_score(y_test, y_pred)
r_squared
# plot the actual and predicted values
plt.scatter(y_test, y_pred)
# add labels and title
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Linear Regression')
# add a diagonal line to show where predictions would be perfect
lims = [min(min(y_test), min(y_pred)), max(max(y_test), max(y_pred))]
plt.plot(lims, lims, 'k--')
# show the plot
plt.show()

Outputs:-

Figure 1: The Original dataset loaded from advertising.csv.xls is displayed.

Figure 2: Shape the data is displayed.

17
Figure 3: Information about the data is displayed.

Figure 4: Data cleaning task has been displayed.

Figure 5: Correlation between the columns is displayed.

18
Figure 6: The root mean squared error is displayed.

Figure 7: Checking of R-squared value on test set.

Figure 8: Visualizing the fit on test set.

Result:- With the given advertising data, a linear regression has been build to predict the
sales based on the money spent on different platforms for marketing

19
19

Data Science
No ratings yet
Data Science
18 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Parth ML
No ratings yet
Parth ML
24 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Dadv Manual
No ratings yet
Dadv Manual
35 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
DA Programs
No ratings yet
DA Programs
44 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
Experiment 1
No ratings yet
Experiment 1
19 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
DADV - Lab - Subject - 303105315
No ratings yet
DADV - Lab - Subject - 303105315
35 pages
Program 4: Public
No ratings yet
Program 4: Public
10 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
MLCyber Lab
No ratings yet
MLCyber Lab
9 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
ML Updated File
No ratings yet
ML Updated File
36 pages
ML Lab Manual 2024
No ratings yet
ML Lab Manual 2024
41 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Shubham Info Practical 3251
No ratings yet
Shubham Info Practical 3251
59 pages
Data Science for Engineers Course
No ratings yet
Data Science for Engineers Course
8 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
36 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
45 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Machine Learning File
No ratings yet
Machine Learning File
28 pages
ML Unit 2
No ratings yet
ML Unit 2
52 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
8 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Machine Learning Lab File (BTCS619-18)
No ratings yet
Machine Learning Lab File (BTCS619-18)
50 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Data Mining Lab: Regression & Clustering
No ratings yet
Data Mining Lab: Regression & Clustering
36 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
3.multiple Correlation & Regression
No ratings yet
3.multiple Correlation & Regression
24 pages
Matchmaking Services-Compressed
100% (1)
Matchmaking Services-Compressed
33 pages
Tohetortojet - On Which To Bet?: Quantifying Erodibility of Embankment Materials For The Modeling of Dam Breach Processes
No ratings yet
Tohetortojet - On Which To Bet?: Quantifying Erodibility of Embankment Materials For The Modeling of Dam Breach Processes
45 pages
B2024016 - JitendraCVS 2
No ratings yet
B2024016 - JitendraCVS 2
1 page
Data Analytics for CMA Students
No ratings yet
Data Analytics for CMA Students
49 pages
Data Mining Workshop with R & Orange
No ratings yet
Data Mining Workshop with R & Orange
2 pages
Pns Assignment
No ratings yet
Pns Assignment
28 pages
Impact of Stresss
No ratings yet
Impact of Stresss
58 pages
Summary of Formulas About Simple Linear Regression
No ratings yet
Summary of Formulas About Simple Linear Regression
2 pages
Problem Set 2 Quantitative Methods UNIGE
No ratings yet
Problem Set 2 Quantitative Methods UNIGE
10 pages
LMU - MSC Data Analytics
No ratings yet
LMU - MSC Data Analytics
20 pages
Technostress
No ratings yet
Technostress
35 pages
Anomaly Detection Insights
No ratings yet
Anomaly Detection Insights
7 pages
Advanced Statistics Manual PDF
100% (3)
Advanced Statistics Manual PDF
258 pages
100CarTimeSeriesDataDictionary v1 1
No ratings yet
100CarTimeSeriesDataDictionary v1 1
14 pages
Nimeesha Report File
No ratings yet
Nimeesha Report File
46 pages
2012 Olympics Tourism Impact
No ratings yet
2012 Olympics Tourism Impact
27 pages
Gate Mechanical
No ratings yet
Gate Mechanical
64 pages
Full-Stack Data Scientist Guide
No ratings yet
Full-Stack Data Scientist Guide
16 pages
PMI PBA - Sample Test 2 X 100 Q&A
80% (5)
PMI PBA - Sample Test 2 X 100 Q&A
52 pages
Technological Impact On Language Anxiety Dynamic - Xiangming, L., Liu, M., & Zhang, C. (2020)
No ratings yet
Technological Impact On Language Anxiety Dynamic - Xiangming, L., Liu, M., & Zhang, C. (2020)
41 pages
Python and Data Analysis
No ratings yet
Python and Data Analysis
12 pages
Family Health Nursing Insights
No ratings yet
Family Health Nursing Insights
26 pages
Oracle 1Z0-1096-23 Exam Q&A
No ratings yet
Oracle 1Z0-1096-23 Exam Q&A
4 pages
On The Selection of M For Fuzzy C-Means
No ratings yet
On The Selection of M For Fuzzy C-Means
7 pages
PYTHON (Miniproject)
No ratings yet
PYTHON (Miniproject)
33 pages
Xebia JD Apprentice DA V2026.1
No ratings yet
Xebia JD Apprentice DA V2026.1
3 pages
Statistical Treatement of Data
100% (1)
Statistical Treatement of Data
4 pages
Palpebral Fissure Length (PFL) Z-Score Calculator: January 1, 1990 January 1, 2000
No ratings yet
Palpebral Fissure Length (PFL) Z-Score Calculator: January 1, 1990 January 1, 2000
2 pages
Deposit Analysis of Navil Bank
No ratings yet
Deposit Analysis of Navil Bank
25 pages

ML

Uploaded by

ML

Uploaded by

A

Submitted By Submitted To:

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Figure 1: Histogram distribution of showing age students

Figure 4: Vertical bar chart comparing counts of each department.

Figure 7: Horizontal bar chart with labels on each bar.

Figure 8: Pie chart showing department proportions.

Figure 10: Scatter plot of GPA vs Graduation Year.

Figure 14: Radar Plot

Figure 1: The Original dataset loaded from adult.csv.xls is displayed.

Figure 3: Dataset after removing the gender column.

Figure 5: Dataset after renaming the column income to annual_income.

Figure 1: The Original dataset loaded from advertising.csv.xls is displayed.

Figure 2: Shape the data is displayed.

Figure 4: Data cleaning task has been displayed.

Figure 5: Correlation between the columns is displayed.

Figure 7: Checking of R-squared value on test set.

Figure 8: Visualizing the fit on test set.

You might also like