0% found this document useful (0 votes)

34 views20 pages

41 DS PL MF

The document is a lab journal for a Data Science using Python course at Don Bosco Institute of Technology, detailing experiments conducted by students on various data manipulation and visualization techniques using libraries like NumPy, Pandas, Matplotlib, and Seaborn. It includes an index of experiments, problem statements, platforms used, and conclusions drawn from each experiment. The journal emphasizes the importance of data preparation, accessing data frames, and visualizing data to derive insights.

Uploaded by

322hassan0059

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views20 pages

41 DS PL MF

Uploaded by

322hassan0059

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Department of Information Technology

oratory Manual for

I.T./ T.E./ Sem V/
ITL503: - Data Science using Python Lab Journal

Submitted by -
Sanffred Cheruvathur 11
Saniya Sandip Padwal 41
Arnav Sawant 52
Niranjan Kumar Yadav 60

Don Bosco Institute of Technology,

Mumbai 400070.
(Affiliated to the University of Mumbai)
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

INDEX

S.N. Name of the experiment Date Page no.

1 Basic commands of NumPy and Pandas 15/01/2025 3-6

2 Accessing data frame using iloc, loc 12/02/2025 8-9

3 Plotting graphs using matplotlib and Seaborn 05/03/2025 10-12

4 Mini Project Report 23/04/2025 13-

2
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Title:
Exp 1 : Data preparation using NumPy and Pandas
Problem Statement:
a. Derive an index field and add it to the data set.
b. Find out the missing values.
c. Obtain a listing of all records that are outliers according to the any field. Print out a listing of
the 10 largest values for that field.
d. Do the following for the any field. i. Standardize the variable. ii. Identify how many outliers
there are and identify the most extreme outlier.

Platform Used:
Google Colab
Name of Dataset:
Superstore marketing campaign dataset (.csv file)
Theory:
Attach screenshot wherever necessary
What are DS, numpy, pandas,What is dataframe, what is EDA?
DS (Data Science): Interdisciplinary field that uses methods, processes, algorithms, and systems
to extract knowledge from structured/unstructured data.
NumPy: A library for numerical computations in Python; supports multi-dimensional arrays and
matrices.
Pandas: A library for data manipulation and analysis; it uses Data-Frames as a core data
structure.
Data-Frame: A 2D labelled data structure with columns of potentially different types (like an
Excel sheet).
EDA (Exploratory Data Analysis): The process of analysing datasets to summarize their main
characteristics.

3
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

how to read_csv file (answer this for the platform which you are using)

How to create dataframe.

how to select one/multiple column/s from dataset.

Single column:

Multiple Columns:

4
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Use of df.columns, head(),tail(), df.dtypes,value_counts(), isnull(),sum(), df.size, len(df),

df.count, describe(), max(), df.shape

5
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

6
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

7
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Create Array and perform operations and display properties.

8
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Performing indexing operations on array

9
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Performing basic slicing on array

Link of Execution:
https://colab.research.google.com/drive/1PKQe9IOo_pJ2PqaEtT8lG0UfLdXYiECo?usp=sharing
Conclusion:
Successfully added an index field to the dataset.
Detected and handled missing values.
Identified outliers and printed 10 largest values in a specific field.
Applied standardization using z-score.
Most extreme outlier and count of total outliers were successfully determined.

10
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Title:

Exp 2 : Accessing data frame using iloc and loc.

Problem Statement:
How to accessing data using iloc.
How to accessing data using loc.
How to access specific rows and columns.

Platform Used:
Google colab
Name of Dataset:
MamalsData.csv
Theory:
iloc in Python, specifically within the pandas library, is used for integer-location based indexing
in DataFrames and Series. It allows the selection of data by specifying the integer positions of
rows and columns. iloc uses a 0-based indexing system, meaning the first row or column is at
position 0. It can accept single integers, lists of integers, or slices to specify the desired rows and
columns. The endpoint of the slice is exclusive.
loc is used in the pandas library in Python to access and manipulate data in a DataFrame using
labels (row and column names). It enables the selection of data based on the specified labels,
offering flexibility in retrieving and modifying subsets of the DataFrame.
Attach screenshot wherever necessary
Basic iloc function
when selecting rows/columns by position (numbers) we use iloc function.

11
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Accessing first row index

12
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Accessing rows from index 1 to 2

Accessing second column

Accessing row of index 2

13
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Basic loc function

when selecting by label or applying conditions.

Accessing rows from index 1 to 2

Accessing column with label ‘State’

14
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Accessing the value in row with label '2' and column 'State' using loc

Link of Execution:
https://colab.research.google.com/drive/1xFK4wMxX7VxvRHR21GlmPVqnB14_Wa3-?
usp=sharing
Conclusion:
In this experiment, we successfully learned how to access data in a DataFrame using both iloc
and loc functions. The iloc function is useful for retrieving data based on integer index positions,
while loc allows access based on labeled indices and column names. Understanding the
difference between these two methods is essential for efficient data manipulation and analysis in
pandas. This experiment helped reinforce the concepts of indexing and slicing, which are
foundational in data preprocessing tasks.

15
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Title:
Exp 3 : Data Visualization / Exploratory Data Analysis for the selected data set using Matplotlib
and Seaborn
Problem Statement:
a. Create a bar graph, contingency table using any 2 variables.
b. Create normalized histogram.
c. Describe what this graphs and tables indicates?

Platform Used:
Name of Dataset:
Theory:
1. What is Visualization?
2. What is and use of Seaborn and Matplotlib?

Attach screenshot wherever necessary

1. What is visualization
Ans- Data visualization is the graphical representation of information and data using visual
elements like charts, graphs, and maps. It helps in understanding trends, patterns, and outliers
in data.

2. Explain what is and use of seaborn and matplotlib.

Ans- Matplotlib is a low-level graph plotting library in Python that offers complete control
over plot elements.
Seaborn is built on top of Matplotlib and provides a high-level interface for drawing
attractive and informative statistical graphics.
Uses:
 Visualize the distribution of data.

16
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

 Show relationships between multiple variables.

 Summarize complex data using colour patterns and statistical estimations.

3. List down Data Visualization Plots

Ans-
 Bar Graph
 Histogram
 Line Chart
 Box Plot
 Violin Plot
 Scatter Plot
 Heatmap
 Pair Plot
 Pie Chart

4. Explain what is bar graph, histogram and contingency table and heatmap, mention when
to use these plots also uni/bi/multi variant.
Ans-
Plot Types Description When to use Variant type
Bar Graph Displays categorical Comparing Univariate or
data with rectangular categories Bivariate
bars.
Histogram Shows frequency Distribution of one Univariate
distribution of variable
numerical data.
Contingency Table Shows frequency Relationship Bivariate
distribution for two between categories
categorical variables.
Heatmap Graphical Highlight patterns or Bivariate or
representation of correlation Multivariate
data in matrix form
with colors.

17
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Screenshots of Code with Output:

Scatter plot

Bar Graph

18
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

HeatMap

19
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology

Data Science using Python lab TE_IT 2025(C scheme)

A.Y. 2024-25 SEM VI

Conclusion:
In this experiment, we explored how to visualize and analyze data using Matplotlib and Seaborn.
We successfully created bar graphs, contingency tables, and normalized histograms. These
visualizations provided insights into category-wise distributions and continuous variable
patterns, helping us better understand our dataset. Mastering such visual tools is essential for any
data analysis or machine learning task.

5 DSL Journal
No ratings yet
5 DSL Journal
39 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
96 pages
Chapter 5 - Data Exploration and Visualization With
No ratings yet
Chapter 5 - Data Exploration and Visualization With
39 pages
Python For ML
No ratings yet
Python For ML
41 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Python Libraries 2
No ratings yet
Python Libraries 2
80 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
Python Data Analysis Libraries Guide
100% (1)
Python Data Analysis Libraries Guide
43 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
47 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Python Data Science Guide
100% (2)
Python Data Science Guide
47 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
Wa0005.
No ratings yet
Wa0005.
29 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
Python For Data Analysis Edgar
No ratings yet
Python For Data Analysis Edgar
49 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
3rd Week Report
No ratings yet
3rd Week Report
7 pages
Python Data Analysis Tutorial
No ratings yet
Python Data Analysis Tutorial
47 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
22 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
DS Final
No ratings yet
DS Final
46 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Python in Research
No ratings yet
Python in Research
18 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
Dev Lab Record
No ratings yet
Dev Lab Record
31 pages
Viva
No ratings yet
Viva
7 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
Advanced Python & Data Science Guide
No ratings yet
Advanced Python & Data Science Guide
42 pages
Ip Study
No ratings yet
Ip Study
18 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
No ratings yet
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
9 pages
FDS Exp4
No ratings yet
FDS Exp4
5 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
DSA LAB Manual - Good Content
No ratings yet
DSA LAB Manual - Good Content
70 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas Data Structures Guide
No ratings yet
Pandas Data Structures Guide
72 pages
Module 6
No ratings yet
Module 6
48 pages
Python Libraries for B.Tech Students
No ratings yet
Python Libraries for B.Tech Students
17 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
Python Libraries: NumPy, Pandas, Matplotlib
No ratings yet
Python Libraries: NumPy, Pandas, Matplotlib
68 pages
Pandas Library: Data Manipulation & Analysis Guide
No ratings yet
Pandas Library: Data Manipulation & Analysis Guide
9 pages
Finallldo
No ratings yet
Finallldo
29 pages
AIML SEM VIII May 2024
No ratings yet
AIML SEM VIII May 2024
10 pages
Physics ATKT R2019 Dated 06.05.2025
No ratings yet
Physics ATKT R2019 Dated 06.05.2025
2 pages
KJSSE CTF 2.0 Event Schedule & Rules
No ratings yet
KJSSE CTF 2.0 Event Schedule & Rules
3 pages
Civil
No ratings yet
Civil
124 pages
IPP Module 5
No ratings yet
IPP Module 5
134 pages
Process
No ratings yet
Process
122 pages
Value-Added Services (VAS) in SAP EWM - SAP Blogs
100% (1)
Value-Added Services (VAS) in SAP EWM - SAP Blogs
15 pages
Power BI Deployment Guide
No ratings yet
Power BI Deployment Guide
188 pages
Affiliate Marketing Action Plan
No ratings yet
Affiliate Marketing Action Plan
7 pages
Teaching Plan BETR3414 PLC
No ratings yet
Teaching Plan BETR3414 PLC
4 pages
Brochure Final
No ratings yet
Brochure Final
46 pages
Lesson Plan - EmTech - 12 - Q3L5 - Manipulating Text, Graphics, and Images To Create ICT Content Intended For An Online Environment
No ratings yet
Lesson Plan - EmTech - 12 - Q3L5 - Manipulating Text, Graphics, and Images To Create ICT Content Intended For An Online Environment
4 pages
Tricore: Licensable 32bit Microcontroller Core
No ratings yet
Tricore: Licensable 32bit Microcontroller Core
2 pages
Programming Exercise Solutions
No ratings yet
Programming Exercise Solutions
4 pages
Paper1ga Final
No ratings yet
Paper1ga Final
5 pages
Accounts Assistant (Finance) .: Subject: Syllabus For Written Test (Objective Type) For The Post of
No ratings yet
Accounts Assistant (Finance) .: Subject: Syllabus For Written Test (Objective Type) For The Post of
3 pages
Free PDF Sheet Music Hanon Exercises
0% (6)
Free PDF Sheet Music Hanon Exercises
2 pages
Blog Management System Guide
No ratings yet
Blog Management System Guide
30 pages
EOS Readme
No ratings yet
EOS Readme
9 pages
Night Hunter Pro - Setup Guide
100% (2)
Night Hunter Pro - Setup Guide
75 pages
Use
No ratings yet
Use
9 pages
69 Datasheet Datasheet Systems Ont Gpon G420i
No ratings yet
69 Datasheet Datasheet Systems Ont Gpon G420i
2 pages
Full Stack Dev: Beginner to Expert
No ratings yet
Full Stack Dev: Beginner to Expert
95 pages
MTK Catcher and Memory Dump Procedure
No ratings yet
MTK Catcher and Memory Dump Procedure
6 pages
Development of A Smart Framework Based On Knowledge To Support Infrastructure Maintenance Decisions in Railway Corridors
No ratings yet
Development of A Smart Framework Based On Knowledge To Support Infrastructure Maintenance Decisions in Railway Corridors
9 pages
11th-Part-A (U3) - Information & CommunicationTechnology Skills-III
No ratings yet
11th-Part-A (U3) - Information & CommunicationTechnology Skills-III
7 pages
On 8-Bus Test System For Solving Challenges in Relay Coordination
No ratings yet
On 8-Bus Test System For Solving Challenges in Relay Coordination
5 pages
QT Cypress 2023
No ratings yet
QT Cypress 2023
5 pages
Fortimanager: Key Features
No ratings yet
Fortimanager: Key Features
8 pages
Is Internet For Porn
No ratings yet
Is Internet For Porn
14 pages
Nov Dec 2023
No ratings yet
Nov Dec 2023
2 pages
Best IT Magazine in India
No ratings yet
Best IT Magazine in India
5 pages
Sistem Monitoring Dan Kontroling Irigasi Sawah Menggunakan Microcontroller Wemos D1 Berbasis Internet of Things
No ratings yet
Sistem Monitoring Dan Kontroling Irigasi Sawah Menggunakan Microcontroller Wemos D1 Berbasis Internet of Things
7 pages
BIT 1101 Computer Architecture
No ratings yet
BIT 1101 Computer Architecture
89 pages