C.
ABDUL HAKEEM COLLEGE OF ENGINEERING AND TECHNOLOGY,
MELVISHARAM.
Hakeem Nagar, Melvisharam - 632 509, Ranipet District, Tamil Nadu, India. (Approved by AICTE, New
Delhi and Affiliated to Anna University, Chennai)
(Regd. Under Sec 2(F) & 12(B) of the UGC Act 1956)
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
AD3411 – DATA SCIENCE AND ANALYTICS LAB RECORD
(REGULATION - 2021)
Name of the
Student:
Register Number:
Degree / Branch:
Year / Semester:
Academic Year: C. ABDUL
HAKEEM COLLEGE OF
ENGINEERING AND TECHNOLOGY
Hakeem Nagar, Melvisharam - 632 509, Ranipet District, Tamil Nadu,
India.
(Approved by AICTE, New Delhi and Affiliated to Anna University,
Chennai) Regd. Under Sec 2(F) & 12(B) of the UGC Act 1956)
Name of the Candidate:
Year: II Semester: IV Degree/Branch: B-TECH - AI & DS
Subject Code: AD3411
Subject Name: DATA SCIENCE & ANALYTICS LAB
University Register Number:
CERTIFICATE
Certified that this is the bonafide record of work done by the above
student in AD3411 – DATA SCIENCE AND ANALYTICS LAB during 2024-
2025.
Signature of Lab In-charge Signature of Head of the Department
Submitted for the University Practical Examination held on ________________
EXAMINERS
Date: Centre code: 5106
Internal: External:
EXP.N DATE EXPERIMENTS PAGE NO MARKS SIGNATUR
O
1.A WORKING WITH PANDAS DATA FRAME – 1
WEEKS OF THE MONTH
1.B WORKING WITH PANDAS DATA FRAME – 2
DAYS AND THE DATE
2.A FREQUENCY DISTRIBUTION, RANGE AND 3
VARIABILITY
2.B FREQUENCY DISTRIBUTION, RANGE AND 4
VARIANCE
3 NORMAL CURVES, CORRELATION & SCATTER 6
PLOTS, CORRELATION COEFFICIENT
4 9
BASIC PLOTS USING MATPLOTLIB
5 11
REGRESSION
6 13
Z - TEST
7 14
ANOVA
8 15
T - TEST
9 16
BUILDING AND VALIDATING LINEAR MODELS
10 BUILDING AND VALIDATING LOGISTICS 18
MODELS
11 IMPLEMENTATION OF TIME SERIES 20
ANALYSIS
INDEX PAGE
DATE:
EXP:1A- WORKING WITH PANDAS DATAFRAME
(WEEKS OF THE MONTH)
Aim:
To work with pandas Data Frame, convert a column of date values to datetime format,
and determine the week number of the month for each date.
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'date': ['2025-02-26', '2024-05-24', '2003-09-19', '2001-09-03',
'2004-06-20']})
df['date'] = pd.to_datetime(df['date'])
df['week of the month'] = (df['date'].dt.day - 1) // 7 + 1
print(df)
Output:
Result:
Hence, the perform of working with pandas Data Frame finding the week of the month
was executed successfully.
1
DATE:
EXP:1B- WORKING WITH PANDAS DATAFRAME
(DAYS AND THE DATE)
Aim:
To create a pandas data frame with a column of dates, convert them into datetime
format, and extract the corresponding day of the week for each date.
Procedure:
Step 1: Go to command prompt
Step 2: Type pip install pandas
Step 3: Go to Jupyter Notebook
Step 4: Execute the program
Program:
import pandas as pd
df = pd.DataFrame({'date': ['2025-02-27', '2025-03-01', '2025-03-05']})
df['date'] = pd.to_datetime(df['date'])
df['day'] = df['date'].dt.day_name()
print(df)
Output:
Result: Here, the perform of working with pandas data frame finding the day was executed
successfully.
2
DATE:
EXP:2A- FREQUENCY DISTRIBUTION, RANGE AND VARIABILITY
Aim:
To write a python program for frequency distribution, range & variability by using numpy.
Procedure:
Step 1: Go to Command prompt
Step 2: Type pip install numpy
Step 3: Go to jupyter notebook
Step 4: Execute the program
Program:
import numpy as np
x = [24, 35, 12, 91, 9, 34, 56]
print(np.average(x))
print(np.var(x))
print(np.std(x))
q1 = np.percentile(x, 25)
q3 = np.percentile(x, 75)
iqr = q3 - q1
print(iqr)
Quartile_deviation = (q3 - q1) / 2
print(Quartile_deviation)
Output:
Result:
Thus, the above program was executed successfully.
EXP:2B- FREQUENCY DISTRIBUTION, RANGE AND VARIANCE
3
DATE:
Aim:
To write a program (for) frequency distribution, range and variance.
Procedure:
Step 1: Go to Command prompt
Step 2: Type pip install numpy
Step 3: Go to jupyter notebook
Step 4: Execute the program
Program:
import numpy as np
x = [25, 83, 15, 30, 17, 72, 40, 32, 19]
x.sort()
print(x)
Range = max(x) - min(x)
print(Range)
print(np.median(x))
print(np.std(x))
print(np.average(x))
print(np.var(x))
q1 = np.percentile(x, 25)
print(q1)
q3 = np.percentile(x, 75)
print(q3)
iqr = q3 - q1
print(iqr)
quartile_deviation = (q3 - q1) / 2
print(quartile_deviation)
4
DATE:
Output:
Result:
Hence the above program was executed successfully.
EXP:3- NORMAL CURVES, CORRELATION & SCATTER PLOTS,
CORRELATION COEFFICIENT
Aim:
To write a program on normal curves, correlation & scatter plots, correlation coefficient.
5
DATE:
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
# Normal Curves
import matplotlib.pyplot as plt
import numpy as np
mu, sigma = 0.5, 0.1
s = np.random.normal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 20, density=True)
Output:
# Correlation and Scatter plots
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
6
DATE:
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation
Output: np.float64(0.8603090020146067)
# Correlation Coefficient
import math
def correlationCoefficient(X, Y, n):
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
i=0
while i < n:
sum_X = sum_X + X[i]
sum_Y = sum_Y + Y[i]
sum_XY = sum_XY + X[i] * Y[i]
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i=i+1
corr = (n * sum_XY - sum_X * sum_Y) / (
((n * squareSum_X - sum_X * sum_X) *
(n * squareSum_Y - sum_Y * sum_Y)) ** 0.5
)
return corr
x = [15, 18, 21, 24, 27]
y = [25, 25, 27, 31, 32]
n = len(x)
print("Correlation coefficient is:", correlationCoefficient(x, y, n))
Output:
Correlation coefficient is: 0.9534625892455922
7
DATE:
Result:
Hence the above program was executed successfully.
EXP:4- BASIC PLOTS USING MATPLOTLIB
Aim:
To write a python program on basic plots using matplotlib.
8
DATE:
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10]
c = [4, 2, 6, 8, 3]
plt.plot(a, label='1st rep')
plt.plot(b, "or", label='2nd rep')
plt.plot(list(range(0, 5)), label='3rd rep')
plt.plot(c, label='4th rep')
plt.xlabel('day-->')
plt.ylabel('temp-->')
ax = plt.gca()
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.xticks(list(range(-3, 10)))
plt.yticks(list(range(-3, 21, 3)))
plt.ylim(-3, 20) # limit Y-axis to suitable range
ax.legend()
plt.annotate('temperature v/s days', xy=(1.01, -2.15))
plt.title('all features are discussed')
plt.show()
Output:
9
DATE:
Result:
Thus, the program for basic matplotlib has been executed and verified successfully.
EXP:5- REGRESSION
Aim:
10
DATE:
To write a program for regression by using numpy.
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x,y):
n=np.size(x)
m_x=np.mean(x)
m_y=np.mean(y)
SS_xy=np.sum(y*x)-n*m_y*m_x
SS_xx=np.sum(x*x)-n*m_x*m_x
b_1=SS_xy/SS_xx
b_0=m_y-b_1*m_x
return(b_0,b_1)
def plot_regression_line(x,y,b):
plt.scatter(x,y,color="m",marker="o",s=30)
y_pred=b[0]+b[1]*x
plt.plot(x,y_pred,color="g")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])
b=estimate_coef(x,y)
print("Estimated Coefficients:\nb_0={}\
\nb_1={}".format(b[0],b[1]))
plot_regression_line(x,y,b)
if __name__=="__main__":
main()
11
DATE:
Output:
Result: Thus, the program was executed and the output was verified successfully.
EXP:6- Z - TEST
Aim:
To write a program on Z-test.
12
DATE:
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
mean_iq=110
sd_iq=15/math.sqrt(50)
alpha=0.05
null_mean=100
data=sd_iq*randn(50)+mean_iq
print('Mean=%.2f Stdv=%.2f' %(np.mean(data),np.std(data)))
ztest_Score,p_value=ztest(data,value=null_mean,alternative='larger')
if(p_value<alpha):
print("Reject NULL Hypothesis")
else:
print("Fail to Reject NULL Hypothesis")
Output:
Result: Thus, the program was executed and the output was verified successfully.
EXP:7- ANOVA
Aim: To write a program using ANOVA.
13
DATE:
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
install.packages("dplyr")
library(dplyr)
boxplot(mtcars$disp ~ factor(mtcars$gear), xlab = "gear", ylab = "disp")
mtcars_aov <- aov(mtcars$disp ~ factor(mtcars$gear))
summary(mtcars_aov)
Output:
Df Sum Sq Mean Sq F value Pr(>F)
factor(mtcars$gear) 2 280221 140110 20.73 2.56e-06 ***
Residuals 29 195964 6757
Result:
Thus, the program was executed successfully.
EXP:8- T - TEST
Aim:
To write a program for T Test.
14
DATE:
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
import numpy as np from scipy import
stats N=10 x=np.random.randn(N)+2
y=np.random.randn(N)
var_x=x.var(ddof=1) var_y=y.var(ddof=1)
SD=np.sqrt((var_x+var_y)/2)
print("Standard Deviation=",SD)
tval=(x.mean()-y.mean())/(SD*np.sqrt(2/N))
dof=2*N-2
pval=1-stats.t.cdf(tval,df=dof)
print("t="+str(tval)) print("p="+str(2*pval))
tval2,pval2=stats.ttest_ind(x,y)
print("t="+str(tval2)) print("p="+str(pval2))
Output:
Result:
Thus, we get the expected output successfully.
EXP:9- BUILDING AND VALIDATING LINEAR MODELS
Aim: To write a program for building and validating linear models.
Procedure:
15
DATE:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
data = load_diabetes()
sns.set(style="ticks", color_codes=True)
plt.rcParams["figure.figsize"] = (8, 5)
plt.rcParams["figure.dpi"] = 150
print(data.keys())
dict_keys = ['data', 'target', 'feature_names', 'DESCR', 'filename']
print(data.DESCR)
df = pd.DataFrame(data.data, columns=data.feature_names)
df.head()
print(df.columns)
print(df.head())
sns.heatmap(df.corr(), square=True, cmap='RdYlGn')
sns.lmplot(x="age", y="bmi", data=df
Output:
dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename',
'target_filename', 'data_module'])
.. _diabetes_dataset: Index(['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'],
dtype='object')
age sex bmi bp s1 s2 s3 \
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142
s4 s5 s6
0 -0.002592 0.019907 -0.017646
1 -0.039493 -0.068332 -0.092204
16
DATE:
2 -0.002592 0.002861 -0.025930
3 0.034309 0.022688 -0.009362
4 -0.002592 -0.031988 -0.046641
<seaborn.axisgrid.FacetGrid at 0x7c41318ecf50>
Result: Thus, we get the expected output successfully.
EXP:10- BUILDING AND VALIDATING LOGISTICS MODELS
Aim:
To write a program for Building and validating logistics models.
17
DATE:
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program
Program:
import statsmodels.api as lm
import pandas as pd
df = pd.read_csv("logits_train1.csv", index_col=0)
xtrain = df[["gmat", "gpa", "work_experience"]]
ytrain = df["admitted"]
log_reg = lm.Logit(ytrain, xtrain).fit()
print(log_reg.summary())
xtest = df[["gmat", "gpa", "work_experience"]]
ytest = df["admitted"]
yhat = log_reg.predict(xtest)
prediction = list(map(round, yhat))
print("Actual values:", list(ytest.values))
print("Predictions:", prediction)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(ytest, prediction)
print("Confusion matrix:\n", cm)
import statsmodels.api as lm
import pandas as pd
df = pd.read_csv("logits_train1.csv", index_col=0)
xtrain = df[["gmat", "gpa", "work_experience"]]
ytrain = df["admitted"]
log_reg = lm.Logit(ytrain, xtrain).fit()
print(log_reg.summary())
xtest = df[["gmat", "gpa", "work_experience"]]
ytest = df["admitted"]
yhat = log_reg.predict(xtest)
prediction = list(map(round, yhat))
print("Actual values:", list(ytest.values))
print("Predictions:", prediction)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(ytest, prediction)
print("Confusion matrix:\n", cm)
18
DATE:
Output:
Optimization terminated successfully.
Current function value: 0.684915
Iterations 4
Logit Regression Results
=========================================================================
=====
Dep. Variable: admitted No. Observations: 29
Model: Logit Df Residuals: 26
Method: MLE Df Model: 2
Date: Tue, 13 May 2025 Pseudo R-squ.: 0.004175
Time: 19:15:19 Log-Likelihood: -19.863
converged: True LL-Null: -19.946
Covariance Type: nonrobust LLR p-value: 0.9201
=========================================================================
=====
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
gmat -0.0021 0.004 -0.523 0.601 -0.010 0.006
gpa 0.4006 0.784 0.511 0.609 -1.136 1.937
work_exp -0.0556 0.267 -0.208 0.835 -0.580 0.468
=========================================================================
=====
Actual values: [np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(1),
np.int64(1), np.int64(0), np.int64(1), np.int64(1), np.int64(0), np.int64(0), np.int64(0),
np.int64(1), np.int64(0), np.int64(1), np.int64(0), np.int64(1), np.int64(1), np.int64(1),
np.int64(0), np.int64(0), np.int64(1), np.int64(0), np.int64(1), np.int64(1), np.int64(0),
np.int64(0), np.int64(1)]
Predictions: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1]
Result:
Thus, the above program was executed successfully.
EXP:11- IMPLEMENT TIME SERIES ANALYSIS
Aim:
To implement a program in time series analysis.
19
DATE:
Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.
Program:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
url="https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-
passengers.csv"
df = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
plt.figure(figsize=(10,4))
plt.plot(df, label='Monthly passengers')
plt.title('Airline Passengers Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Passengers')
plt.legend()
plt.grid(True)
plt.show()
decomposition = seasonal_decompose(df['Passengers'],
model='multiplicative')
decomposition.plot()
plt.suptitle("Multiplicative decomposition", fontsize=14)
plt.tight_layout()
plt.show()
stl = STL(df["Passengers"], seasonal=13)
stl_result = stl.fit()
stl_result.plot()
stl_result = stl.fit()
plt.suptitle("STL Decomposition (Seasonal-Trend-Loess)",
fontsize=14)
plt.tight_layout()
plt.show()
print("Trend, seasonality, and residual components are now extracted.")
Output:
20
DATE:
21
DATE:
Trend, seasonality, and residual components are now extracted.
H4
Result:
Thus, the program was executed successfully.
22