0% found this document useful (0 votes)

20 views27 pages

Sowmi DS

The document outlines a series of experiments focused on data analysis techniques using Excel and Python libraries such as NumPy and Pandas. It covers descriptive statistics, histogram analysis, basic operations on NumPy arrays, DataFrame manipulations, handling missing data, and preprocessing techniques for machine learning datasets. Each experiment includes step-by-step instructions and sample code for performing various data analysis tasks.

Uploaded by

23b01a1253

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views27 pages

Sowmi DS

Uploaded by

23b01a1253

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Experiment 1:

Aim: -
Perform Descriptive statistics of given dataset using Data Analysis Toolbox of Excel
Performing Descriptive Statistics in Excel: Short Note
1.Open Dataset: Launch Excel and open your data file.
2.Enable Toolpak: Go to File > Options > Add-ins, select Excel Add-ins, click Go, check Analysis ToolPak, and click
OK.
3.Access Toolpak: Go to the Data tab and click Data Analysis.
4.Select Tool: Choose Descriptive Statistics and click OK.
5.Input Range: Enter your data range, specify if data is in Columns or Rows, and check Labels in First Row if
applicable.
6.Output Options: Select Output Range or New Worksheet Ply, and check Summary statistics.
7.Run Analysis: Click OK to generate the descriptive statistics.

Sample Input:

Output:

1|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment 2:
AIM: -
Perform the Histogram Analysis of given dataset using Data Analysis Toolbox of Excel.
Steps to Perform Histogram Analysis in Excel:
1. Enter the Data:
 Manually input the above data into an Excel worksheet.
2. Enable Data Analysis ToolPak:
 Go to File > Options > Add-ins.
 In the Manage box, select Excel Add-ins and click Go.
 Check the Analysis ToolPak box, and click OK.
3. Perform Histogram Analysis:
 Go to the Data tab in Excel.
 Click on Data Analysis in the Analysis group.
 Choose Histogram from the list and click OK.
 For the Input Range
 Select the Output Range where you want the results to appear.
 Check the box for Chart Output to create a histogram chart.
Click OK to generate the histogram.

OUTPUT: -

2|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment 3: -
AIM: - Perform the following operations
a) Basic Operations on NumPy

b) Computations on numpy’s Arrays

c) DataFrames in Pandas

d) Hierarchical Indexing

e) Vectorized String Operations

Program:
Basic Operations on NumPy

Step1:
Import necessary libraries
import numpy as np

Step2:
Create an array
a=np.array([[1207,2,3],[4,5,6]])
a

OUTPUT:

Step3:
Performing operations on numpy array

Finding the data type of elements

dtype_a = a.dtype
print(dtype_a)

OUTPUT:

Finding the shape of the numpy array

np.shape(a)
OUTPUT:

Finding the size of the numpy array

np.size(a)

3|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

OUTPUT:

b=np.asarray(a)
b

OUTPUT: -

c=np.copy(b)
c

OUTPUT: -

a.ravel()

OUTPUT: -

Arithmetic Operations on nd array

Addition of two nd arrays
a+a

OUTPUT:

Multiplication of two arrays:

a*a

OUTPUT:

Subtraction of Two arrays:

a-a
OUTPUT:

Division of Two arrays:

a/a
OUTPUT:

4|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Modulus of Two arrays:
a%a

OUTPUT:

Aggregate Functions
import numpy as np
a=np.array([[1207,2,3], [4,5,6]])
print(a)

OUTPUT:

Minimum value from the given array

a.min()

OUTPUT: -

Maximum value from the given array

a.max()
OUTPUT: -

Sum of the given array elements

a.sum()
OUTPUT: -

Mean of the given array

a.mean()
OUTPUT: -

b) Computations on numpy’s Arrays

Universal Functions
Creating array
a=np. array([[1207,2,3], [4,5,6]])
a

5|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

OUTPUT: -

np.sqrt(a)

OUTPUT: -

b=np.round(2.6)
b

OUTPUT: -

b=np.abs(2.6)
b

OUTPUT: -

b=np.square(2)
b

OUTPUT: -

c=np.isfinite(-1)
c

OUTPUT: -

c=np.isnan(0)
c

OUTPUT: -

c=np.ceil(5.8)
c

OUTPUT: -

c=np.log(4)
c

6|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

OUTPUT: -

c=np.sum([0.5, 1.5])
c

OUTPUT: -

d=np.cos(1)
d

OUTPUT: -

e=np.floor(6.8)
e

OUTPUT: -

f=np.sin(8)
f

OUTPUT: -

c) Basic operations on Dataframes using pandas

Creating a Series
import pandas as pd
data_dict = {
'Name': ['Sowji', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Score': [85, 88, 90, 92]
}
df = pd.DataFrame(data_dict)
print(df)

OUTPUT: -

print(df.head())

7|Page 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

OUTPUT: -

print(df.info())
OUTPUT: -

print(df['Name'])

OUTPUT: -

filtered_df = df[df['Age'] > 30]

print(filtered_df)

OUTPUT: -

df['Grade'] = ['A', 'B', 'A', 'A']

print(df)

OUTPUT: -

df = df.drop('Grade', axis=1)
print(df)
OUTPUT: -

sorted_df = df.sort_values(by='Age', ascending=False)

print(sorted_df)

8|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

OUTPUT: -

Hierarchical Indexing
import pandas as pd
arrays = [
['Sowji', 'Sowji', 'B', 'B'],
['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('Upper', 'Lower'))
df = pd.DataFrame({'Values': [10, 20, 30, 40]}, index=index)
print(df)

OUTPUT: -

print(df.loc['Sowji', 'one'])

OUTPUT: -

df_sorted = df.sort_index()
print(df_sorted)

OUTPUT: -

df_swapped = df.swaplevel()
print(df_swapped)

OUTPUT: -

df_reset = df.reset_index()
print(df_reset)
OUTPUT: -

9|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

e) Vectorized String Operations
import pandas as pd
data = pd.DataFrame({'Names': ['Sowji', 'Bob', 'Charlie', 'David']})
data['Names_Lower'] = data['Names'].str.lower()
data['Names_Upper'] = data['Names'].str.upper()
print(data)
OUTPUT: -

data['Contains_al'] = data['Names'].str.contains('al', case=False)

print(data)

OUTPUT: -

data['Names_Replaced'] = data['Names'].str.replace('a', 'X', case=False)

print(data)

OUTPUT: -

data['Name_Length'] = data['Names'].str.len()
print(data)
OUTPUT: -

10 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment 4: -
AIM: - Write an application to find and Removing Duplicate Records in the given dataset
PROGRAM: -
Step1: - Create a dataset
import pandas as pd
import numpy as np
data = {
'ID': [1207, 2, 3, 1, 1207],
'Name': ['SOWJI', 'Alice', 'Bob', 'John', 'SOWJI'],
'Age': [25, 30, 35,25, 25]
}
df = pd.DataFrame(data)
print(df)

OUTPUT: -

Step2: - Identifying duplicates

duplicate_mask = df. duplicated(data)
duplicates = df[duplicate_mask]
print(duplicates)

OUTPUT: -

Step3: - Removing duplicates from dataset Using drop_duplicates METHOD

data = df. drop_duplicates ()

print(data)

OUTPUT: -

Removing duplicates Specific record

df_no_duplicates_last = df. drop_duplicates(keep='last')

print(df_no_duplicates_last)

OUTPUT: -

11 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

12 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab
Experiment 5: -
AIM: - Write an application to handle the Missing Data in the given dataset
Step1: -
Create a dataset and place missing value with np.nan
import pandas as pd
import numpy as np
data = {
'ID': [1207, 2, 3, 4, np.nan],
'Name': ['Sowji', np.nan, 'Sowji', 'Charlie', 'Alice'],
'Age': [25, 30, np.nan, 35, 28],
'Salary': [50000, 54000, 58000, np.nan, 52000]
}
df = pd.DataFrame(data)
print(df)

OUTPUT: -

Step2: -
Identifying missing data
data = df.isnull()
print(data)

OUTPUT: -

Step3: -
Drop rows with any missing data
df_dropna = df.dropna()
print(df_dropna)

OUTPUT: -

13 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Step4: -
Fill missing data with specific values
df_filled = df.fillna({
'ID': 0,
'Name': 'Unknown',
'Age': df['Age'].mean(),
'Salary': df['Salary'].median()
})
print(df_filled)

OUTPUT: -

Fill missing data with forward fill method

df_ffill = df.fillna(method='ffill')
print(df_ffill)

OUTPUT: -

Fill missing data with backward fill method

df_bffill = df.fillna(method='bfill')
print(df_bffill)

OUTPUT: -

14 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment 6: -
AIM: -
Write an application to work with pivot table and cross tabulation in Data Frame .

Step1: -
Create a dataset
import pandas as pd
import numpy as np
data = {
'Employee': ['SOWJI', 'Alice', 'Bob', 'Charlie',],
'Department': ['HR', 'Finance', 'IT', 'HR'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 70000, 80000],
'Bonus': [5000, 7000, 6000, 8000,]
}
df = pd.DataFrame(data)
print(df)

OUTPUT: -

Step2: - Create a pivot table

pivot_table = pd.pivot_table(df,
values= ['Salary', 'Bonus'],
index=['Department'],
aggfunc= {'Salary': np.mean, 'Bonus': np.sum})
print(pivot_table)

OUTPUT: -

Cross tabulation: -
cross_tab = pd.crosstab(df['Department'], df['Employee'])

15 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

print("\nCross Tabulation:")
print(cross_tab)

OUTPUT: -

16 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment-7
AIM: - Perform following preprocessing techniques on loan prediction dataset a. Feature Scaling b.
Feature Standardization c. Label Encoding d. One Hot Encoding
# Import necessary libraries
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler, StandardScaler, LabelEncoder, OneHotEncoder
# Load the Iris dataset from seaborn
iris = sns.load_dataset('iris')
# Display the first few rows of the dataset
print(iris.head())
# Feature Scaling (Min-Max Scaling)
# Scale all the numerical columns (sepal_length, sepal_width, petal_length, petal_width)
scaler = MinMaxScaler()
iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']] = scaler.fit_transform(
iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
)
# Display the scaled dataset
print("\nAfter Feature Scaling (Min-Max Scaling):")
print(iris.head())

# A. Feature Standardization
# Standardize all the numerical columns
std_scaler = StandardScaler()
iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']] = std_scaler.fit_transform(
iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
)
# Display the standardized dataset
print("\nAfter Feature Standardization:")
print(iris.head())

# B. Label Encoding (for 'species' column)

label_encoder = LabelEncoder()
iris['species'] = label_encoder.fit_transform(iris['species'])
# Display the label encoded dataset
print("\nAfter Label Encoding (Species):")
print(iris.head())
# One-Hot Encoding (for 'species' column in the original dataset before Label Encoding)
17 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab
# Reload the original iris dataset for One-Hot Encoding
iris_original = sns.load_dataset('iris')
# C. Apply One-Hot Encoding to the 'species' column
iris_onehot = pd.get_dummies(iris_original, columns=['species'], drop_first=True)
# Display the one-hot encoded dataset
print("\nAfter One-Hot Encoding (Species):")
print(iris_onehot.head())
OUTPUT:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

After Feature Scaling (Min-Max Scaling):

sepal_length sepal_width petal_length petal_width species
0 0.222222 0.625000 0.067797 0.041667 setosa
1 0.166667 0.416667 0.067797 0.041667 setosa
2 0.111111 0.500000 0.050847 0.041667 setosa
3 0.083333 0.458333 0.084746 0.041667 setosa
4 0.194444 0.666667 0.067797 0.041667 setosa

After Feature Standardization:

sepal_length sepal_width petal_length petal_width species
0 -0.900681 1.019004 -1.340227 -1.315444 setosa
1 -1.143017 -0.131979 -1.340227 -1.315444 setosa
2 -1.385353 0.328414 -1.397064 -1.315444 setosa
3 -1.506521 0.098217 -1.283389 -1.315444 setosa
4 -1.021849 1.249201 -1.340227 -1.315444 setosa

After Label Encoding (Species):

sepal_length sepal_width petal_length petal_width species
0 -0.900681 1.019004 -1.340227 -1.315444 0
1 -1.143017 -0.131979 -1.340227 -1.315444 0
2 -1.385353 0.328414 -1.397064 -1.315444 0
3 -1.506521 0.098217 -1.283389 -1.315444 0

18 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

4 -1.021849 1.249201 -1.340227 -1.315444 0

After One-Hot Encoding (Species):

sepal_length sepal_width petal_length petal_width species_versicolor \
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0

species_virginica
0 0
1 0
2 0
3 0
4 0

19 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment-8
AIM: - Write an application to perform Data Visualization with Python using matplotlib
PROGRAM: -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('/iris.data.csv',header=None)
print(data)

OUTPUT: -

Scatter Plot
data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
plt.scatter(data['sepal_length'], data['sepal_width'], c='blue')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Sepal Length vs Sepal Width')
plt.show()

OUTPUT: -

LINE CHART
20 | P a g e 22B01A125 1SVECW IT_A-III YEAR-I SEM Data Science Lab
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('/iris.data.csv', header=None)
data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
plt.plot(data.index, data['sepal_length'], label='Sepal Length', color='blue')
plt.plot(data.index, data['sepal_width'], label='Sepal Width', color='green')
plt.xlabel('Index')
plt.ylabel('Value')
plt.title('Sepal Length and Width over Index')
plt.legend()
plt.show()

OUTPUT: -

Histogram: -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('/iris.data.csv', header=None)
data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
plt.hist(data['sepal_length'], bins=10, color='skyblue', edgecolor='black')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Frequency')
plt.title('Histogram of Sepal Length')
plt.show()
plt.hist(data['sepal_width'], bins=10, color='salmon', edgecolor='black')
plt.xlabel('Sepal Width (cm)')
plt.ylabel('Frequency')
plt.title('Histogram of Sepal Width')
plt.show()
21 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab
OUTPUT: -

22 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment 9: -
AIM: -Apply and explore various plotting functions from seaborn library
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv('/Housing.csv')
print(data)
OUTPUT: -

Scatterplot
sns.scatterplot(x='area', y='price', data=data)
plt.title('Price vs. Area')
plt.show()

OUTPUT: -

Histplot for Price

sns.histplot(data['price'], kde=True)
plt.title('Distribution of Price')
plt.show()

OUTPUT: -

23 | P a g e 22B01A125 1SVECW IT_A-III YEAR-I SEM Data Science Lab

Histplot for Area
sns.histplot(data['area'], kde=True)
plt.title('Distribution of area')
plt.show()

OUTPUT: -

24 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Experiment-10
AIM: - Write an application to apply regression model on the given data set
# Importing necessary libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
# Load the dataset
# Assume 'house_data.csv' contains columns like 'area', 'bedrooms', 'bathrooms', 'location', 'price'
df = pd.read_csv('Housing.csv')
# Display the first few rows
print(df.head())
# Preprocessing
# Check for missing values
print(df.isnull().sum())
# Handling missing values (filling missing values with the median for simplicity)
df.fillna(df.median(), inplace=True)
# Converting categorical data into numeric using One-Hot Encoding
df = pd.get_dummies(df, drop_first=True)
# Split data into features (X) and target variable (y)
X = df.drop('price', axis=1) # Features
y = df['price'] # Target
# Feature scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Train the regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
print(f"Root Mean Squared Error: {rmse}")
print(f"R-squared: {r2}")
# Visualizing the relationship between actual and predicted prices
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.7)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted House Prices')
plt.show()
OUTPUT: -

25 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

10 b. Write an application to apply regression model for iris data set
# Importing necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
# Load the Iris dataset from seaborn
iris = sns.load_dataset('iris')
# Display the first few rows of the dataset
print(iris.head())
# We'll predict 'petal_width' based on the other features (excluding 'species')
X = iris.drop(['petal_width', 'species'], axis=1) # Features: sepal_length, sepal_width, petal_length
y = iris['petal_width'] # Target: petal_width
# Feature scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Train the regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model

26 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
print(f"Root Mean Squared Error: {rmse}")
print(f"R-squared: {r2}")
# Visualizing the relationship between actual and predicted petal width
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.7, color='b')
plt.xlabel('Actual Petal Width')
plt.ylabel('Predicted Petal Width')
plt.title('Actual vs Predicted Petal Width')
plt.show()
# Plotting feature importance (coefficients) for linear regression
plt.figure(figsize=(8, 6))
coefficients = model.coef_
features = ['sepal_length', 'sepal_width', 'petal_length']
sns.barplot(x=features, y=coefficients)
plt.title('Feature Importance (Linear Regression Coefficients)')
plt.ylabel('Coefficient Value')
plt.show()

OUTPUT: -

27 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

Python Data Analytics Exam
No ratings yet
Python Data Analytics Exam
6 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
Data Analysis Exam for CS Majors
No ratings yet
Data Analysis Exam for CS Majors
12 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
NumPy and Pandas Basics Guide
No ratings yet
NumPy and Pandas Basics Guide
8 pages
Data Science & Analytics Lab Manual
No ratings yet
Data Science & Analytics Lab Manual
39 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Ge - Computer Science Data Analysis
No ratings yet
Ge - Computer Science Data Analysis
16 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Data Science
No ratings yet
Data Science
42 pages
Python Unit-5
No ratings yet
Python Unit-5
14 pages
Python 1
No ratings yet
Python 1
16 pages
DAV Previous Year
No ratings yet
DAV Previous Year
7 pages
Easiest Lab Programs
No ratings yet
Easiest Lab Programs
5 pages
ML Programs
No ratings yet
ML Programs
34 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
Assignment 7
No ratings yet
Assignment 7
1 page
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
DP Prog
No ratings yet
DP Prog
10 pages
AL Notes
No ratings yet
AL Notes
61 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Staff Manula 01
No ratings yet
Staff Manula 01
7 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
GE - Computer Scien 4ogygeb
No ratings yet
GE - Computer Scien 4ogygeb
8 pages
L and T Projects - Colabs
No ratings yet
L and T Projects - Colabs
7 pages
Data Analytics Using Python Lab Manual
50% (2)
Data Analytics Using Python Lab Manual
8 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
Dav 2024 Pyq
No ratings yet
Dav 2024 Pyq
7 pages
Pandas Series & DataFrame Guide
No ratings yet
Pandas Series & DataFrame Guide
60 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
No ratings yet
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
12 pages
AD3411
No ratings yet
AD3411
28 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
Big Data Analysis
No ratings yet
Big Data Analysis
38 pages
L - AND - T - Project - Naveen 24cs002895
No ratings yet
L - AND - T - Project - Naveen 24cs002895
7 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
DSA LAB Manual - Good Content
No ratings yet
DSA LAB Manual - Good Content
70 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Dsa Record-1
No ratings yet
Dsa Record-1
153 pages
Experiment-07 Import JSOn CSV
No ratings yet
Experiment-07 Import JSOn CSV
8 pages
Project 1
No ratings yet
Project 1
1 page
AI Prompts
No ratings yet
AI Prompts
2 pages
Files - Replit1
No ratings yet
Files - Replit1
1 page
F
No ratings yet
F
122 pages
Python 3 2
No ratings yet
Python 3 2
6 pages
Python Programming Lab Index
No ratings yet
Python Programming Lab Index
4 pages
Unit 1 - CN
No ratings yet
Unit 1 - CN
55 pages
Expt 4 - CRC
No ratings yet
Expt 4 - CRC
4 pages
Exp 6
No ratings yet
Exp 6
1 page
1.5.1.2 Drawing Forces Activity
No ratings yet
1.5.1.2 Drawing Forces Activity
3 pages
Corrections and Minor Revisions of Mathematical Methods in The Physical Sciences, Third Edition, by Mary L. Boas (Deceased)
No ratings yet
Corrections and Minor Revisions of Mathematical Methods in The Physical Sciences, Third Edition, by Mary L. Boas (Deceased)
6 pages
Lectures On Fire Protection Engineering
No ratings yet
Lectures On Fire Protection Engineering
72 pages
Management Theory Jungle
No ratings yet
Management Theory Jungle
4 pages
Circle Equations and Properties
No ratings yet
Circle Equations and Properties
21 pages
Sensors 17 02963
No ratings yet
Sensors 17 02963
23 pages
Area and Perimeter Homework Year 4 Tes
100% (1)
Area and Perimeter Homework Year 4 Tes
5 pages
Supply Chain Management Presentation
No ratings yet
Supply Chain Management Presentation
30 pages
TestDaily分享-Physics 1 Session 6-FRQ
No ratings yet
TestDaily分享-Physics 1 Session 6-FRQ
4 pages
Arun Sir 1500 Ebook
No ratings yet
Arun Sir 1500 Ebook
451 pages
BVA Test Case Template For Commission Problem
No ratings yet
BVA Test Case Template For Commission Problem
10 pages
Market Analyzer Tutorial Strategies
100% (3)
Market Analyzer Tutorial Strategies
18 pages
David Tong Large N
No ratings yet
David Tong Large N
32 pages
History of Probability
No ratings yet
History of Probability
4 pages
Comparison Trees Notes
No ratings yet
Comparison Trees Notes
7 pages
Iso 5458 2018
No ratings yet
Iso 5458 2018
15 pages
Cochrans Q Test
No ratings yet
Cochrans Q Test
8 pages
Unil Ecch 26 Magnetism
No ratings yet
Unil Ecch 26 Magnetism
72 pages
Pruning Attention Heads of Transformer Models Using A Search
No ratings yet
Pruning Attention Heads of Transformer Models Using A Search
22 pages
5 Roll's Theorem.
No ratings yet
5 Roll's Theorem.
5 pages
Class IX Maths: Angle Constructions
No ratings yet
Class IX Maths: Angle Constructions
16 pages
Freesat Huffman Table (Compression Type 1)
No ratings yet
Freesat Huffman Table (Compression Type 1)
35 pages
Industrial Project Management A Handbook of Planning Scheduling Evaluation Techniques
No ratings yet
Industrial Project Management A Handbook of Planning Scheduling Evaluation Techniques
103 pages
Marketing Research Quiz 005: Check
No ratings yet
Marketing Research Quiz 005: Check
4 pages
A. B. C. D.: 1. For The System in The Given Figure The Characteristic Equation Is
No ratings yet
A. B. C. D.: 1. For The System in The Given Figure The Characteristic Equation Is
10 pages
Core Maths Super Mock 2025 For All Candidates
100% (2)
Core Maths Super Mock 2025 For All Candidates
6 pages
Parametric Cost Estimating of Sterile Building Using Artificial Neural Network & Genetic Algorithm Model
No ratings yet
Parametric Cost Estimating of Sterile Building Using Artificial Neural Network & Genetic Algorithm Model
8 pages
Year 10 5.2 At#1 Solutions
No ratings yet
Year 10 5.2 At#1 Solutions
11 pages
Sacred Geometry: Analyzing Geometry in Indian Temple Architecture
No ratings yet
Sacred Geometry: Analyzing Geometry in Indian Temple Architecture
12 pages
Parental Involvement and Math Performance
No ratings yet
Parental Involvement and Math Performance
5 pages

Sowmi DS

Uploaded by

Sowmi DS

Uploaded by

Experiment 1:

1|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

2|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

b) Computations on numpy’s Arrays

e) Vectorized String Operations

Finding the data type of elements

Finding the shape of the numpy array

Finding the size of the numpy array

3|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Arithmetic Operations on nd array

Multiplication of two arrays:

Subtraction of Two arrays:

Division of Two arrays:

4|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Minimum value from the given array

Maximum value from the given array

Sum of the given array elements

Mean of the given array

b) Computations on numpy’s Arrays

5|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

6|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

c) Basic operations on Dataframes using pandas

7|Page 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

filtered_df = df[df['Age'] > 30]

df['Grade'] = ['A', 'B', 'A', 'A']

sorted_df = df.sort_values(by='Age', ascending=False)

8|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

9|Page 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

data['Contains_al'] = data['Names'].str.contains('al', case=False)

data['Names_Replaced'] = data['Names'].str.replace('a', 'X', case=False)

10 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Step2: - Identifying duplicates

Step3: - Removing duplicates from dataset Using drop_duplicates METHOD

Removing duplicates Specific record

11 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

13 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Fill missing data with forward fill method

Fill missing data with backward fill method

14 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

Step2: - Create a pivot table

15 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

16 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

# B. Label Encoding (for 'species' column)

After Feature Scaling (Min-Max Scaling):

After Feature Standardization:

After Label Encoding (Species):

18 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

After One-Hot Encoding (Species):

19 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

22 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

Histplot for Price

23 | P a g e 22B01A125 1SVECW IT_A-III YEAR-I SEM Data Science Lab

24 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

import matplotlib.pyplot as plt

import seaborn as sns

25 | P a g e 22B01A1251SVECW IT_A-III YEAR-I SEM Data Science Lab

26 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

27 | P a g e 22B01A1251 SVECW IT_A-III YEAR-I SEM Data Science Lab

You might also like