0% found this document useful (0 votes)
18 views26 pages

Fdsa Record Ai&Ds

The document is a lab record for the Data Science and Analytics course at C. Abdul Hakeem College of Engineering and Technology, detailing various experiments conducted in the lab. It includes aims, procedures, and results for multiple experiments involving data analysis using Python libraries such as Pandas, NumPy, and Matplotlib. The record is intended for submission for university practical examinations.

Uploaded by

mdnafeed29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views26 pages

Fdsa Record Ai&Ds

The document is a lab record for the Data Science and Analytics course at C. Abdul Hakeem College of Engineering and Technology, detailing various experiments conducted in the lab. It includes aims, procedures, and results for multiple experiments involving data analysis using Python libraries such as Pandas, NumPy, and Matplotlib. The record is intended for submission for university practical examinations.

Uploaded by

mdnafeed29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

C.

ABDUL HAKEEM COLLEGE OF ENGINEERING AND TECHNOLOGY,


MELVISHARAM.

Hakeem Nagar, Melvisharam - 632 509, Ranipet District, Tamil Nadu, India. (Approved by AICTE, New
Delhi and Affiliated to Anna University, Chennai)

(Regd. Under Sec 2(F) & 12(B) of the UGC Act 1956)

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

AD3411 – DATA SCIENCE AND ANALYTICS LAB RECORD

(REGULATION - 2021)

Name of the
Student:

Register Number:

Degree / Branch:

Year / Semester:
Academic Year: C. ABDUL
HAKEEM COLLEGE OF
ENGINEERING AND TECHNOLOGY
Hakeem Nagar, Melvisharam - 632 509, Ranipet District, Tamil Nadu,
India.
(Approved by AICTE, New Delhi and Affiliated to Anna University,
Chennai) Regd. Under Sec 2(F) & 12(B) of the UGC Act 1956)

Name of the Candidate:

Year: II Semester: IV Degree/Branch: B-TECH - AI & DS

Subject Code: AD3411


Subject Name: DATA SCIENCE & ANALYTICS LAB

University Register Number:

CERTIFICATE

Certified that this is the bonafide record of work done by the above
student in AD3411 – DATA SCIENCE AND ANALYTICS LAB during 2024-
2025.

Signature of Lab In-charge Signature of Head of the Department

Submitted for the University Practical Examination held on ________________

EXAMINERS

Date: Centre code: 5106


Internal: External:
EXP.N DATE EXPERIMENTS PAGE NO MARKS SIGNATUR
O
1.A WORKING WITH PANDAS DATA FRAME – 1
WEEKS OF THE MONTH

1.B WORKING WITH PANDAS DATA FRAME – 2


DAYS AND THE DATE

2.A FREQUENCY DISTRIBUTION, RANGE AND 3


VARIABILITY

2.B FREQUENCY DISTRIBUTION, RANGE AND 4


VARIANCE

3 NORMAL CURVES, CORRELATION & SCATTER 6


PLOTS, CORRELATION COEFFICIENT

4 9
BASIC PLOTS USING MATPLOTLIB

5 11
REGRESSION

6 13
Z - TEST

7 14
ANOVA

8 15
T - TEST

9 16
BUILDING AND VALIDATING LINEAR MODELS

10 BUILDING AND VALIDATING LOGISTICS 18


MODELS

11 IMPLEMENTATION OF TIME SERIES 20


ANALYSIS

INDEX PAGE
DATE:

EXP:1A- WORKING WITH PANDAS DATAFRAME


(WEEKS OF THE MONTH)

Aim:
To work with pandas Data Frame, convert a column of date values to datetime format,
and determine the week number of the month for each date.

Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:

import pandas as pd
from datetime import datetime
df = pd.DataFrame({'date': ['2025-02-26', '2024-05-24', '2003-09-19', '2001-09-03',
'2004-06-20']})
df['date'] = pd.to_datetime(df['date'])
df['week of the month'] = (df['date'].dt.day - 1) // 7 + 1
print(df)

Output:

Result:
Hence, the perform of working with pandas Data Frame finding the week of the month
was executed successfully.

1
DATE:

EXP:1B- WORKING WITH PANDAS DATAFRAME


(DAYS AND THE DATE)

Aim:
To create a pandas data frame with a column of dates, convert them into datetime
format, and extract the corresponding day of the week for each date.

Procedure:
Step 1: Go to command prompt
Step 2: Type pip install pandas
Step 3: Go to Jupyter Notebook
Step 4: Execute the program

Program:

import pandas as pd
df = pd.DataFrame({'date': ['2025-02-27', '2025-03-01', '2025-03-05']})
df['date'] = pd.to_datetime(df['date'])
df['day'] = df['date'].dt.day_name()
print(df)

Output:

Result: Here, the perform of working with pandas data frame finding the day was executed
successfully.

2
DATE:

EXP:2A- FREQUENCY DISTRIBUTION, RANGE AND VARIABILITY

Aim:
To write a python program for frequency distribution, range & variability by using numpy.

Procedure:
Step 1: Go to Command prompt
Step 2: Type pip install numpy
Step 3: Go to jupyter notebook
Step 4: Execute the program

Program:
import numpy as np
x = [24, 35, 12, 91, 9, 34, 56]
print(np.average(x))
print(np.var(x))
print(np.std(x))
q1 = np.percentile(x, 25)
q3 = np.percentile(x, 75)
iqr = q3 - q1
print(iqr)
Quartile_deviation = (q3 - q1) / 2
print(Quartile_deviation)

Output:

Result:
Thus, the above program was executed successfully.

EXP:2B- FREQUENCY DISTRIBUTION, RANGE AND VARIANCE

3
DATE:

Aim:
To write a program (for) frequency distribution, range and variance.

Procedure:
Step 1: Go to Command prompt
Step 2: Type pip install numpy
Step 3: Go to jupyter notebook
Step 4: Execute the program

Program:

import numpy as np
x = [25, 83, 15, 30, 17, 72, 40, 32, 19]
x.sort()
print(x)
Range = max(x) - min(x)
print(Range)
print(np.median(x))
print(np.std(x))
print(np.average(x))
print(np.var(x))
q1 = np.percentile(x, 25)
print(q1)
q3 = np.percentile(x, 75)
print(q3)
iqr = q3 - q1
print(iqr)
quartile_deviation = (q3 - q1) / 2
print(quartile_deviation)

4
DATE:

Output:

Result:
Hence the above program was executed successfully.

EXP:3- NORMAL CURVES, CORRELATION & SCATTER PLOTS,


CORRELATION COEFFICIENT
Aim:
To write a program on normal curves, correlation & scatter plots, correlation coefficient.

5
DATE:

Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:
# Normal Curves
import matplotlib.pyplot as plt
import numpy as np
mu, sigma = 0.5, 0.1
s = np.random.normal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 20, density=True)

Output:

# Correlation and Scatter plots


import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

6
DATE:

y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation

Output: np.float64(0.8603090020146067)

# Correlation Coefficient
import math
def correlationCoefficient(X, Y, n):
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
i=0
while i < n:
sum_X = sum_X + X[i]
sum_Y = sum_Y + Y[i]
sum_XY = sum_XY + X[i] * Y[i]
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i=i+1
corr = (n * sum_XY - sum_X * sum_Y) / (
((n * squareSum_X - sum_X * sum_X) *
(n * squareSum_Y - sum_Y * sum_Y)) ** 0.5
)
return corr
x = [15, 18, 21, 24, 27]
y = [25, 25, 27, 31, 32]
n = len(x)
print("Correlation coefficient is:", correlationCoefficient(x, y, n))

Output:

Correlation coefficient is: 0.9534625892455922

7
DATE:

Result:
Hence the above program was executed successfully.

EXP:4- BASIC PLOTS USING MATPLOTLIB

Aim:
To write a python program on basic plots using matplotlib.

8
DATE:

Procedure:

Step 1: Go to command prompt.


Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:

import matplotlib.pyplot as plt


a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10]
c = [4, 2, 6, 8, 3]
plt.plot(a, label='1st rep')
plt.plot(b, "or", label='2nd rep')
plt.plot(list(range(0, 5)), label='3rd rep')
plt.plot(c, label='4th rep')
plt.xlabel('day-->')
plt.ylabel('temp-->')
ax = plt.gca()
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.xticks(list(range(-3, 10)))
plt.yticks(list(range(-3, 21, 3)))
plt.ylim(-3, 20) # limit Y-axis to suitable range
ax.legend()
plt.annotate('temperature v/s days', xy=(1.01, -2.15))
plt.title('all features are discussed')
plt.show()

Output:

9
DATE:

Result:
Thus, the program for basic matplotlib has been executed and verified successfully.

EXP:5- REGRESSION

Aim:

10
DATE:

To write a program for regression by using numpy.

Procedure:

Step 1: Go to command prompt.


Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x,y):
n=np.size(x)
m_x=np.mean(x)
m_y=np.mean(y)
SS_xy=np.sum(y*x)-n*m_y*m_x
SS_xx=np.sum(x*x)-n*m_x*m_x
b_1=SS_xy/SS_xx
b_0=m_y-b_1*m_x
return(b_0,b_1)
def plot_regression_line(x,y,b):
plt.scatter(x,y,color="m",marker="o",s=30)
y_pred=b[0]+b[1]*x
plt.plot(x,y_pred,color="g")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])
b=estimate_coef(x,y)
print("Estimated Coefficients:\nb_0={}\
\nb_1={}".format(b[0],b[1]))
plot_regression_line(x,y,b)
if __name__=="__main__":
main()

11
DATE:

Output:

Result: Thus, the program was executed and the output was verified successfully.

EXP:6- Z - TEST
Aim:
To write a program on Z-test.

12
DATE:

Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
mean_iq=110
sd_iq=15/math.sqrt(50)
alpha=0.05
null_mean=100
data=sd_iq*randn(50)+mean_iq
print('Mean=%.2f Stdv=%.2f' %(np.mean(data),np.std(data)))
ztest_Score,p_value=ztest(data,value=null_mean,alternative='larger')
if(p_value<alpha):
print("Reject NULL Hypothesis")
else:
print("Fail to Reject NULL Hypothesis")

Output:

Result: Thus, the program was executed and the output was verified successfully.

EXP:7- ANOVA

Aim: To write a program using ANOVA.

13
DATE:

Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:
install.packages("dplyr")
library(dplyr)
boxplot(mtcars$disp ~ factor(mtcars$gear), xlab = "gear", ylab = "disp")
mtcars_aov <- aov(mtcars$disp ~ factor(mtcars$gear))
summary(mtcars_aov)

Output:
Df Sum Sq Mean Sq F value Pr(>F)
factor(mtcars$gear) 2 280221 140110 20.73 2.56e-06 ***
Residuals 29 195964 6757

Result:
Thus, the program was executed successfully.

EXP:8- T - TEST

Aim:
To write a program for T Test.

14
DATE:

Procedure:

Step 1: Go to command prompt.


Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:

import numpy as np from scipy import


stats N=10 x=np.random.randn(N)+2
y=np.random.randn(N)
var_x=x.var(ddof=1) var_y=y.var(ddof=1)
SD=np.sqrt((var_x+var_y)/2)
print("Standard Deviation=",SD)
tval=(x.mean()-y.mean())/(SD*np.sqrt(2/N))
dof=2*N-2
pval=1-stats.t.cdf(tval,df=dof)
print("t="+str(tval)) print("p="+str(2*pval))
tval2,pval2=stats.ttest_ind(x,y)
print("t="+str(tval2)) print("p="+str(pval2))

Output:

Result:

Thus, we get the expected output successfully.


EXP:9- BUILDING AND VALIDATING LINEAR MODELS

Aim: To write a program for building and validating linear models.

Procedure:

15
DATE:

Step 1: Go to command prompt.


Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
data = load_diabetes()
sns.set(style="ticks", color_codes=True)
plt.rcParams["figure.figsize"] = (8, 5)
plt.rcParams["figure.dpi"] = 150
print(data.keys())
dict_keys = ['data', 'target', 'feature_names', 'DESCR', 'filename']
print(data.DESCR)
df = pd.DataFrame(data.data, columns=data.feature_names)
df.head()
print(df.columns)
print(df.head())
sns.heatmap(df.corr(), square=True, cmap='RdYlGn')
sns.lmplot(x="age", y="bmi", data=df

Output:

dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename',


'target_filename', 'data_module'])
.. _diabetes_dataset: Index(['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'],
dtype='object')
age sex bmi bp s1 s2 s3 \
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142

s4 s5 s6
0 -0.002592 0.019907 -0.017646
1 -0.039493 -0.068332 -0.092204

16
DATE:

2 -0.002592 0.002861 -0.025930


3 0.034309 0.022688 -0.009362
4 -0.002592 -0.031988 -0.046641
<seaborn.axisgrid.FacetGrid at 0x7c41318ecf50>

Result: Thus, we get the expected output successfully.

EXP:10- BUILDING AND VALIDATING LOGISTICS MODELS

Aim:
To write a program for Building and validating logistics models.

17
DATE:

Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program

Program:
import statsmodels.api as lm
import pandas as pd
df = pd.read_csv("logits_train1.csv", index_col=0)
xtrain = df[["gmat", "gpa", "work_experience"]]
ytrain = df["admitted"]
log_reg = lm.Logit(ytrain, xtrain).fit()
print(log_reg.summary())
xtest = df[["gmat", "gpa", "work_experience"]]
ytest = df["admitted"]
yhat = log_reg.predict(xtest)
prediction = list(map(round, yhat))
print("Actual values:", list(ytest.values))
print("Predictions:", prediction)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(ytest, prediction)
print("Confusion matrix:\n", cm)
import statsmodels.api as lm
import pandas as pd
df = pd.read_csv("logits_train1.csv", index_col=0)
xtrain = df[["gmat", "gpa", "work_experience"]]
ytrain = df["admitted"]
log_reg = lm.Logit(ytrain, xtrain).fit()
print(log_reg.summary())
xtest = df[["gmat", "gpa", "work_experience"]]
ytest = df["admitted"]
yhat = log_reg.predict(xtest)
prediction = list(map(round, yhat))
print("Actual values:", list(ytest.values))
print("Predictions:", prediction)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(ytest, prediction)
print("Confusion matrix:\n", cm)

18
DATE:

Output:
Optimization terminated successfully.
Current function value: 0.684915
Iterations 4
Logit Regression Results
=========================================================================
=====
Dep. Variable: admitted No. Observations: 29
Model: Logit Df Residuals: 26
Method: MLE Df Model: 2
Date: Tue, 13 May 2025 Pseudo R-squ.: 0.004175
Time: 19:15:19 Log-Likelihood: -19.863
converged: True LL-Null: -19.946
Covariance Type: nonrobust LLR p-value: 0.9201
=========================================================================
=====
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
gmat -0.0021 0.004 -0.523 0.601 -0.010 0.006
gpa 0.4006 0.784 0.511 0.609 -1.136 1.937
work_exp -0.0556 0.267 -0.208 0.835 -0.580 0.468
=========================================================================
=====
Actual values: [np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(1),
np.int64(1), np.int64(0), np.int64(1), np.int64(1), np.int64(0), np.int64(0), np.int64(0),
np.int64(1), np.int64(0), np.int64(1), np.int64(0), np.int64(1), np.int64(1), np.int64(1),
np.int64(0), np.int64(0), np.int64(1), np.int64(0), np.int64(1), np.int64(1), np.int64(0),
np.int64(0), np.int64(1)]
Predictions: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1]

Result:
Thus, the above program was executed successfully.

EXP:11- IMPLEMENT TIME SERIES ANALYSIS

Aim:

To implement a program in time series analysis.

19
DATE:

Procedure:
Step 1: Go to command prompt.
Step 2: Type pip install pandas.
Step 3: Go to Jupyter Notebook.
Step 4: Execute the program.

Program:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
url="https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-
passengers.csv"
df = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
plt.figure(figsize=(10,4))
plt.plot(df, label='Monthly passengers')
plt.title('Airline Passengers Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Passengers')
plt.legend()
plt.grid(True)
plt.show()
decomposition = seasonal_decompose(df['Passengers'],
model='multiplicative')
decomposition.plot()
plt.suptitle("Multiplicative decomposition", fontsize=14)
plt.tight_layout()
plt.show()
stl = STL(df["Passengers"], seasonal=13)
stl_result = stl.fit()
stl_result.plot()
stl_result = stl.fit()
plt.suptitle("STL Decomposition (Seasonal-Trend-Loess)",
fontsize=14)
plt.tight_layout()
plt.show()
print("Trend, seasonality, and residual components are now extracted.")

Output:

20
DATE:

21
DATE:

Trend, seasonality, and residual components are now extracted.

H4

Result:

Thus, the program was executed successfully.

22

You might also like