0% found this document useful (0 votes)

13 views7 pages

Ds Pract 2 Vedanti

The document outlines an assignment for data wrangling involving the creation of an 'Academic performance' dataset for students. It includes tasks such as handling missing values, identifying and dealing with outliers, and applying data transformations. The document also provides code snippets demonstrating the use of Python libraries like pandas and seaborn for data manipulation and visualization.

Uploaded by

pranjal.shinde.aids.2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Ds Pract 2 Vedanti

Uploaded by

pranjal.shinde.aids.2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

In [1]: '''Assignment-2

Name : Shedage Vedanti Deepak

Class: TE(AI&DS)
Roll No:18
2) Data Wrangling II
Create an “Academic performance” dataset of students and perform the following operat
Python.
1. Scan all variables for missing values and inconsistencies. If there are missing va
inconsistencies, use any of the suitable techniques to deal with them.
2. Scan all numeric variables for outliers. If there are outliers, use any of the sui
to deal with them.
3. Apply data transformations on at least one of the variables. The purpose of this
transformation should be one of the following reasons: to change the scale for better
understanding of the variable, to convert a non-linear relation into a linear one, or
the skewness and convert the distribution into a normal distribution.'''
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]: dic = {
'Roll No': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'Name': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'k', 'L'],
'Marathi': [99.0, 95.0, 85.0, 60.0, 98.0, np.nan, np.nan, 90.0, 81.0, 63.0, np.na
'English': [np.nan, 84.0, 85.0, 65.0, 79.0, np.nan, 95.0, 91.0, np.nan, 93.0, np.
}

In [5]: df=pd.DataFrame(dic)

In [7]: df

Out[7]: Roll No Name Marathi English

0 1 A 99.0 NaN

1 2 B 95.0 84.0

2 3 C 85.0 85.0

3 4 D 60.0 65.0

4 5 E 98.0 79.0

5 6 F NaN NaN

6 7 G NaN 95.0

7 8 H 90.0 91.0

8 9 I 81.0 NaN

9 10 J 63.0 93.0

10 11 k NaN NaN

11 12 L 52.0 NaN

In [9]: df.head()
Out[9]: Roll No Name Marathi English

0 1 A 99.0 NaN

1 2 B 95.0 84.0

2 3 C 85.0 85.0

3 4 D 60.0 65.0

4 5 E 98.0 79.0

In [11]: df.tail()

Out[11]: Roll No Name Marathi English

7 8 H 90.0 91.0

8 9 I 81.0 NaN

9 10 J 63.0 93.0

10 11 k NaN NaN

11 12 L 52.0 NaN

In [13]: df.describe()

Out[13]: Roll No Marathi English

count 12.000000 9.000000 7.000000

mean 6.500000 80.333333 84.571429

std 3.605551 17.705931 10.293317

min 1.000000 52.000000 65.000000

25% 3.750000 63.000000 81.500000

50% 6.500000 85.000000 85.000000

75% 9.250000 95.000000 92.000000

max 12.000000 99.000000 95.000000

In [15]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Roll No 12 non-null int64
1 Name 12 non-null object
2 Marathi 9 non-null float64
3 English 7 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 512.0+ bytes

In [17]: df['Marathi'].fillna(0,inplace=False)
Out[17]: 0 99.0
1 95.0
2 85.0
3 60.0
4 98.0
5 0.0
6 0.0
7 90.0
8 81.0
9 63.0
10 0.0
11 52.0
Name: Marathi, dtype: float64

In [19]: df

Out[19]: Roll No Name Marathi English

0 1 A 99.0 NaN

1 2 B 95.0 84.0

2 3 C 85.0 85.0

3 4 D 60.0 65.0

4 5 E 98.0 79.0

5 6 F NaN NaN

6 7 G NaN 95.0

7 8 H 90.0 91.0

8 9 I 81.0 NaN

9 10 J 63.0 93.0

10 11 k NaN NaN

11 12 L 52.0 NaN

In [21]: df["English"].fillna(method='ffill',inplace=True)

In [23]: df

Out[23]: Roll No Name Marathi English

0 1 A 99.0 NaN

1 2 B 95.0 84.0

2 3 C 85.0 85.0

3 4 D 60.0 65.0

4 5 E 98.0 79.0

5 6 F NaN 79.0

6 7 G NaN 95.0

7 8 H 90.0 91.0

8 9 I 81.0 91.0

9 10 J 63.0 93.0

10 11 k NaN 93.0

11 12 L 52.0 93.0

In [25]: df.dropna(inplace=True)
In [27]: df.isnull()

Out[27]: Roll No Name Marathi English

1 False False False False

2 False False False False

3 False False False False

4 False False False False

7 False False False False

8 False False False False

9 False False False False

11 False False False False

In [29]: df['Marathi'].mean()

Out[29]: 78.0

In [31]: df['Marathi'].fillna(78,inplace=True)

In [33]: df

Out[33]: Roll No Name Marathi English

1 2 B 95.0 84.0

2 3 C 85.0 85.0

3 4 D 60.0 65.0

4 5 E 98.0 79.0

7 8 H 90.0 91.0

8 9 I 81.0 91.0

9 10 J 63.0 93.0

11 12 L 52.0 93.0

In [35]: df.isnull()

Out[35]: Roll No Name Marathi English

1 False False False False

2 False False False False

3 False False False False

4 False False False False

7 False False False False

8 False False False False

9 False False False False

11 False False False False

In [37]: import seaborn as sns

In [39]: sns.boxplot(y=df['Marathi'])
Out[39]: <Axes: ylabel='Marathi'>

In [41]: sns.boxplot(y=df['English'])

Out[41]: <Axes: ylabel='English'>

In [43]: import numpy as np

Q1 = df['English'].quantile(0.25)
Q3 = df['English'].quantile(0.75)
IQR = Q3 - Q1

In [45]: lower_limit = Q1 - 1.5 * IQR

upper_limit = Q3 + 1.5 * IQR

In [47]: IQR

Out[47]: 8.75
In [49]: lower_limit

Out[49]: 69.625

In [51]: upper_limit

Out[51]: 104.625

In [53]: df.shape

Out[53]: (8, 4)

In [55]: # Remove outliers (keeping only valid values)

df_cleaned_English= df[
((df['English'] >= lower_limit) & (df['English'] <= upper_limit)) |
(df['English'].isna()) # Keep NaN values
]

In [57]: print("\nCleaned DataFrame:")

print(df_cleaned_English)

Cleaned DataFrame:
Roll No Name Marathi English
1 2 B 95.0 84.0
2 3 C 85.0 85.0
4 5 E 98.0 79.0
7 8 H 90.0 91.0
8 9 I 81.0 91.0
9 10 J 63.0 93.0
11 12 L 52.0 93.0

In [59]: import numpy as np

Q1 = df['Marathi'].quantile(0.25)
Q3 = df['Marathi'].quantile(0.75)
IQR = Q3 - Q1

In [61]: lower_limit_Marathi = Q1 - 1.5 * IQR

upper_limit_Marathi = Q3 + 1.5 * IQR

In [63]: IQR

Out[63]: 29.0

In [65]: lower_limit_Marathi

Out[65]: 18.75

In [67]: upper_limit_Marathi

Out[67]: 134.75

In [69]: df.shape

Out[69]: (8, 4)

In [71]: # Remove outliers (keeping only valid values)

df_cleaned_Marathi = df[
((df['Marathi'] >= lower_limit_Marathi) & (df['English'] <= upper_limit_Marathi))
(df['Marathi'].isna()) # Keep NaN values
]

In [73]: print("\nCleaned DataFrame:")

print(df_cleaned_Marathi)

Cleaned DataFrame:
Roll No Name Marathi English
1 2 B 95.0 84.0
2 3 C 85.0 85.0
3 4 D 60.0 65.0
4 5 E 98.0 79.0
7 8 H 90.0 91.0
8 9 I 81.0 91.0
9 10 J 63.0 93.0
11 12 L 52.0 93.0

In [75]: import seaborn as sns

sns.boxplot(df)

Out[75]: <Axes: >

In [ ]:

AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Vantika Kamra's Practical File 12 Diamond (26600872)
No ratings yet
Vantika Kamra's Practical File 12 Diamond (26600872)
46 pages
Data Wrangling - Jupyter Notebook
No ratings yet
Data Wrangling - Jupyter Notebook
5 pages
Pandas 2 Complete Notes Class XII
No ratings yet
Pandas 2 Complete Notes Class XII
18 pages
Python
No ratings yet
Python
16 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
AES 12th IP All QA With Code0000000000000
No ratings yet
AES 12th IP All QA With Code0000000000000
7 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Data Wrangling
No ratings yet
Data Wrangling
5 pages
Python & Pandas Cheat Sheet Guide
100% (2)
Python & Pandas Cheat Sheet Guide
5 pages
Python Pandas Assignment Guide
No ratings yet
Python Pandas Assignment Guide
9 pages
Series 1
No ratings yet
Series 1
408 pages
Class 12 Pandas Practical Guide
No ratings yet
Class 12 Pandas Practical Guide
15 pages
Exp 3
No ratings yet
Exp 3
10 pages
12 Pandas
No ratings yet
12 Pandas
14 pages
File Ip
No ratings yet
File Ip
22 pages
Dataframe in Pandas
No ratings yet
Dataframe in Pandas
23 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Data Analytics Lab Manuals 2025-2026-1
No ratings yet
Data Analytics Lab Manuals 2025-2026-1
39 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Practical File ANKIT RAJ CLASS 12-F
No ratings yet
Practical File ANKIT RAJ CLASS 12-F
48 pages
Info Practical
No ratings yet
Info Practical
56 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Sakina Assign1 Batch3
No ratings yet
Sakina Assign1 Batch3
8 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Python Cheat Sheet 2.0
100% (2)
Python Cheat Sheet 2.0
10 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
Class 12 IP Practice Assignment Series 14
No ratings yet
Class 12 IP Practice Assignment Series 14
4 pages
Pandas Series & DataFrame Guide
No ratings yet
Pandas Series & DataFrame Guide
60 pages
Python Data Handling with Pandas
No ratings yet
Python Data Handling with Pandas
12 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Notebook PYTHON DATA SCIENCE
No ratings yet
Notebook PYTHON DATA SCIENCE
16 pages
Revision Notes DataFrame XII IP
No ratings yet
Revision Notes DataFrame XII IP
8 pages
Data Integration and Missing Values Analysis
No ratings yet
Data Integration and Missing Values Analysis
23 pages
Revision - Data Frames
No ratings yet
Revision - Data Frames
6 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Python Pandas - 2 2020-21
No ratings yet
Python Pandas - 2 2020-21
21 pages
MCQ On Dataframe
No ratings yet
MCQ On Dataframe
11 pages
LP II Practical
No ratings yet
LP II Practical
5 pages
Data Science Practical 01
No ratings yet
Data Science Practical 01
12 pages
DSBDL Pract 2
No ratings yet
DSBDL Pract 2
6 pages
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
No ratings yet
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
39 pages
DA Lab Manual r22
No ratings yet
DA Lab Manual r22
31 pages
DAV Previous Year
No ratings yet
DAV Previous Year
7 pages
Xii Ip Practical List 2022-23-1
No ratings yet
Xii Ip Practical List 2022-23-1
23 pages
Python & Data Science Cheat Sheet
100% (4)
Python & Data Science Cheat Sheet
11 pages
Data Sci
No ratings yet
Data Sci
29 pages
AL Notes
No ratings yet
AL Notes
61 pages
Dataframe Practical
No ratings yet
Dataframe Practical
14 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
CSC 222 - Data Wrangling and EDA
No ratings yet
CSC 222 - Data Wrangling and EDA
5 pages
Davp Pyq 2023 Solution
No ratings yet
Davp Pyq 2023 Solution
15 pages
SL2VIVA
No ratings yet
SL2VIVA
25 pages
Ds 9
No ratings yet
Ds 9
12 pages
Datascience PR 6 Veda
No ratings yet
Datascience PR 6 Veda
6 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
Electric Car Literature Review
100% (1)
Electric Car Literature Review
8 pages
Bali Land for Luxury Resort Development
No ratings yet
Bali Land for Luxury Resort Development
12 pages
Business Stats Exam Guide
No ratings yet
Business Stats Exam Guide
11 pages
Pathfinder 1998 1999
No ratings yet
Pathfinder 1998 1999
1,818 pages
Wi-Fi Easy Connect Specification v3.0
No ratings yet
Wi-Fi Easy Connect Specification v3.0
188 pages
2 - Factors Changing IBE, Economic Growth Impact
No ratings yet
2 - Factors Changing IBE, Economic Growth Impact
11 pages
Guaranteed Income Goal
No ratings yet
Guaranteed Income Goal
6 pages
Lab Biological Spill Protocol
No ratings yet
Lab Biological Spill Protocol
3 pages
The Good Life Lab - Sneak Peek
No ratings yet
The Good Life Lab - Sneak Peek
13 pages
Computer Science Internship Report
No ratings yet
Computer Science Internship Report
29 pages
TGM 18.240 4x2 BB Freight Carrier Compact Cab
No ratings yet
TGM 18.240 4x2 BB Freight Carrier Compact Cab
4 pages
Educ 203 - Mooe
No ratings yet
Educ 203 - Mooe
4 pages
Computer Education Franchise - Computer Institute Franchise - Computer Center Franchise
No ratings yet
Computer Education Franchise - Computer Institute Franchise - Computer Center Franchise
24 pages
4test: Free Valid Test Questions and Dumps PDF For Certification Test Prep
No ratings yet
4test: Free Valid Test Questions and Dumps PDF For Certification Test Prep
7 pages
Series 21i MODEL A Series 210i MODEL A: Fanuc Fanuc
No ratings yet
Series 21i MODEL A Series 210i MODEL A: Fanuc Fanuc
423 pages
Nxlog Reference Manual
No ratings yet
Nxlog Reference Manual
165 pages
HS2031 Individual Assignment T1 2021 Harshvardhan
No ratings yet
HS2031 Individual Assignment T1 2021 Harshvardhan
8 pages
Transportation Law Full Text Case PDF
No ratings yet
Transportation Law Full Text Case PDF
1,364 pages
Datacard SR200/SR300 Data Sheet
No ratings yet
Datacard SR200/SR300 Data Sheet
2 pages
Multi-Function Transducer Guide
No ratings yet
Multi-Function Transducer Guide
48 pages
DAVID - v. - THE - STATE - 2000 - (2) - BLR - 142 - (CA) - HIV
No ratings yet
DAVID - v. - THE - STATE - 2000 - (2) - BLR - 142 - (CA) - HIV
5 pages
Method Statement For Cable Pulling
100% (1)
Method Statement For Cable Pulling
6 pages
KARAN GUPTA Fiver GIG
No ratings yet
KARAN GUPTA Fiver GIG
2 pages
L2 - Job Analysis and Design
No ratings yet
L2 - Job Analysis and Design
19 pages
Industrial Air Dryer Specs
No ratings yet
Industrial Air Dryer Specs
9 pages
Church Covington Bambu Account Statement
No ratings yet
Church Covington Bambu Account Statement
4 pages
Final Project
No ratings yet
Final Project
20 pages
Driver Z Registration Guide
No ratings yet
Driver Z Registration Guide
3 pages
Tolman's Sign Theory of Learning - Education
No ratings yet
Tolman's Sign Theory of Learning - Education
11 pages
Re-Thinking The 4P's
No ratings yet
Re-Thinking The 4P's
32 pages