0% found this document useful (0 votes)
5 views21 pages

ML

Uploaded by

Arjun Maheshwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views21 pages

ML

Uploaded by

Arjun Maheshwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

A

LAB FILE
ON
MACHINE LEARNING: THEORY & APPLICATIONS
(COURSE CODE: AIML-202)

Submitted By Submitted To:


Name : Raghav Pandey Dr. Dhruv Sharma
Enrollment No. : A2305224360 Assistant Professor
Class & Section : 3CSE5Y. Amity Centre for Artificial intelligence

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY
AMITY UNIVERSITY UTTAR PRADESH, NOIDA
SESSION: ODD SEM. (2025-26)
EXPERIMENT- 01
Importing Data and Data Visualization Task
Aim:- Construct various types of plots/charts like histogram, bar chart, pie chart, and scatter plot by
importing data from a CSV format file. Further label different axes and data in a plot.
Libraries Used:-
 pandas
 matplotlib.pyplot
 random
Theory:-
Pandas is used for reading and handling data from CSV files in a DataFrame format.
Matplotlib’s pyplot module is used to create different visualizations like histograms, bar charts, pie
charts, and scatter plots with customization.
The random module generates random numbers, though it is not used in this program.
Code:-
import random
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('students.csv')
plt.hist(data['Age'])
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()
plt.hist(data['Age'],color='green')
# plt.hist
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('age Distribution')
plt.show()
plt.hist(data['Age'],color='green',bins = 20)
# plt.hist
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()
plt.bar(data['Department'].unique(), data['Department'].value_counts())
plt.xlabel('Department')
plt.ylabel('Count')
plt.title('Department Comparison')

1
plt.show()
plt.figure(figsize=(1,2))
# creating Bar chart with a width size of 0.4
plt.bar(data['Department'].unique(), data['Department'].value_counts(),width = 0.4)
plt.xlabel('Department')
plt.ylabel('Count')
plt.title('Department Comparison')
plt.show()
plt.figure(figsize=(2,1))
# creating Bar chart with a height of 0.4
plt.barh(data['Department'].unique(), data['Department'].value_counts(),height=0.4)
plt.xlabel('Department')
plt.ylabel('Department')
plt.title('Department Comparison')
plt.show()
plt.figure(figsize=(2,1))
# creating Bar chart with a height of 0.4
bars = plt.barh(data['Department'].unique(), data['Department'].value_counts(),height=0.4)
plt.bar_label(bars)
plt.xlabel('Department')
plt.ylabel('Count')
plt.title('Department Comparison')
plt.show()
plt.pie(data['Department'].value_counts(), labels=data['Department'].unique())
plt.title('Department Proportion')
plt.show()
plt.pie(data['Department'].value_counts(), labels=data['Department'].unique(),autopct='%1.1f
%%')
plt.title('Department Proportion')
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'])
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'],color = 'red')
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')

2
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'],c = 'red',edgecolor = 'black')
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')
plt.show()
plt.scatter(data['GPA'], data['GraduationYear'],c = 'red',edgecolor = 'black',s = 100)
plt.xlabel('GPA')
plt.ylabel('GraduationYear')
plt.title('GPA vs GraduationYear')
plt.show()
labels = df['Age'].tolist()
values = df['GPA'].tolist()
values += values[:1]
angles = np.linspace(0, 2 * np.pi, len(labels), endpoint=False).tolist()
angles += angles[:1]
plt.figure(figsize=(6, 6))
ax = plt.subplot(111, polar=True)
plt.xticks(angles[:-1], labels)
ax.plot(angles, values, color='navy', linewidth=2)
ax.fill(angles, values, color='lightblue', alpha=0.5)
plt.title('GPA AS PER Age RADAR PLOT', size=14,pad=20)
plt.show()

OUTPUT:

Figure 1: Histogram distribution of showing age students

3
Figure 3: Green histogram of student ages with 20 bins for more detail.

Figure 4: Vertical bar chart comparing counts of each department.

Figure 5: Vertical bar chart with adjusted width for each department.

4
Figure 6: Horizontal bar chart comparing departments with reduced bar height.

Figure 7: Horizontal bar chart with labels on each bar.

Figure 8: Pie chart showing department proportions.

5
Figure 9: Pie chart showing department proportions with percentage values.

Figure 10: Scatter plot of GPA vs Graduation Year.

6
Figure 11: Red scatter plot of GPA vs Graduation Year.

Figure 12: Red scatter plot with black edges for each point.

7
Figure 13: Red scatter plot with black edges and larger marker size.

Figure 14: Radar Plot

RESULT:-
Various plots like histogram, bar chart, pie chart, and scatter plot were successfully created
using data from a CSV file. Each plot was properly labeled with titles, axis names, and
customized styles.
8
EXPERIMENT- 02
Data Cleaning and Pre-processing Task
Aim:- Fill the missing values, removing/inserting columns, labelling the output column, feature
scaling, converting the categorical values to numerical values, etc, by importing data from a CSV
format file.
Libraries Used:-
• pandas
• numpy
• sklearn.preprocessing (MinMaxScaler, StandardScaler, RobustScaler)
• sklearn.preprocessing (LabelEncoder)

Theory:-
The dataset is cleaned by handling missing values, dropping irrelevant columns, and adding derived
columns. Categorical values are converted to numbers using label encoding, and numerical features
are scaled for uniformity. One-hot encoding is applied to convert categorical data into binary dummy
variables, enhancing model compatibility.
Code:-
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn.preprocessing import LabelEncoder
# Load the data
df =
pd.read_csv("adult.csv.xls")
#Identify missing values
df.columns
df.isnull().sum()
print(df)
#drop rows with missing values
df.dropna(inplace=True)
print(df)
# Remove irrelevant column
df1 = df.drop('gender', axis=1)
print(df1)
#Insert new column with age squared
df1 = df1.assign(age_squared=lambda x: x['age']**2)
print(df1)
# Rename target variable column
9
df1 = df1.rename(columns={'income': 'annual_income'})

10
print(df1.columns)
# Convert 'annual_income' to numerical using Label Encod
le = LabelEncoder()
df1['annual_income'] = le.fit_transform(df1['annual_income'])
print(df1)
# Step 4: Scale numeric columns
scaler = MinMaxScaler()
df1['age'] =
scaler.fit_transform(df1[['age']]) print(df1)
#Standard scaling
scaler = StandardScaler()
df1[['annual_income', 'age_squared']] = scaler.fit_transform(df1[['annual_income',
'age_squared']])
print(df1)
#Converting Categorical Values to Numerical
Values df1 = pd.get_dummies(df1, columns=['age'])
print(df1)
OUTPUT:-

Figure 1: The Original dataset loaded from adult.csv.xls is displayed.

11
Figure 2: Dataset after removing rows with missing values.

Figure 3: Dataset after removing the gender column.

12
Fig 4: Dataset with a new column age_squared added.

Figure 5: Dataset after renaming the column income to annual_income.

13
Figure 6: Dataset after converting annual_income values to numeric using label encoding.

Figure 7: Dataset after applying Standard scaling to annual_income and age_squared columns.

14
Figure 8: Dataset after converting categorical values of age into dummy variables.

RESULT:-
The data was cleaned and prepared by fixing missing values, changing text to numbers, and scaling
the values.

15
EXPERIMENT- 03
Linear Regression
Aim:- Given an advertising data, build a model to predict the sales based on the money spent on
different platforms for marketing. Implement an ordinary least squares linear regression to test the
accuracy of the model.
Libraries Used:-
• import pandas as pd
• import seaborn as sns
• from sklearn import preprocessing
• from matplotlib import pyplot as plt
• import numpy as np
• from sklearn.model_selection import train_test_split
• from sklearn.linear_model import LinearRegression
• from sklearn.metrics import mean_squared_error
• from sklearn.metrics import r2_score
Theory:-
Linear regression is a technique for modeling the relationship between a dependent variable (Y) and
one independent variable (X). It assumes that there is a linear relationship between X and Y, and
tries to fit a straight line that best describes this relationship.
Code:-
import pandas as pd
import seaborn as sns
from sklearn import preprocessing
from matplotlib import pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
c1 = "advertising.csv"
# load the data into a pandas dataframe
df = pd.read_csv(c1)
df = pd.DataFrame(df)
print(df)
df.shape
df.info()
df.isnull().sum()*100/df.shape[0]
# There are no NULL values in the dataset, hence it is clean.
# Let's see the correlation between different variables.
sns.heatmap(df.corr(),annot = True)
# separate the target variable and the feature
X = df[['TV']]
y = df['Sales']
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.3,random_state = 41)

16
# create a linear regression model
model = LinearRegression()
# fit the model to the training data
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
#Returns the mean squared error; we'll take a square root
np.sqrt(mean_squared_error(y_test, y_pred))
r_squared = r2_score(y_test, y_pred)
r_squared
# plot the actual and predicted values
plt.scatter(y_test, y_pred)
# add labels and title
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Linear Regression')
# add a diagonal line to show where predictions would be perfect
lims = [min(min(y_test), min(y_pred)), max(max(y_test), max(y_pred))]
plt.plot(lims, lims, 'k--')
# show the plot
plt.show()

Outputs:-

Figure 1: The Original dataset loaded from advertising.csv.xls is displayed.

Figure 2: Shape the data is displayed.

17
Figure 3: Information about the data is displayed.

Figure 4: Data cleaning task has been displayed.

Figure 5: Correlation between the columns is displayed.

18
Figure 6: The root mean squared error is displayed.

Figure 7: Checking of R-squared value on test set.

Figure 8: Visualizing the fit on test set.

Result:- With the given advertising data, a linear regression has been build to predict the
sales based on the money spent on different platforms for marketing

19
19

You might also like