Department of Information Technology
oratory Manual for
I.T./ T.E./ Sem V/
ITL503: - Data Science using Python Lab Journal
Submitted by -
Sanffred Cheruvathur 11
Saniya Sandip Padwal 41
Arnav Sawant 52
Niranjan Kumar Yadav 60
Don Bosco Institute of Technology,
Mumbai 400070.
(Affiliated to the University of Mumbai)
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
INDEX
S.N. Name of the experiment Date Page no.
1 Basic commands of NumPy and Pandas 15/01/2025 3-6
2 Accessing data frame using iloc, loc 12/02/2025 8-9
3 Plotting graphs using matplotlib and Seaborn 05/03/2025 10-12
4 Mini Project Report 23/04/2025 13-
2
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Title:
Exp 1 : Data preparation using NumPy and Pandas
Problem Statement:
a. Derive an index field and add it to the data set.
b. Find out the missing values.
c. Obtain a listing of all records that are outliers according to the any field. Print out a listing of
the 10 largest values for that field.
d. Do the following for the any field. i. Standardize the variable. ii. Identify how many outliers
there are and identify the most extreme outlier.
Platform Used:
Google Colab
Name of Dataset:
Superstore marketing campaign dataset (.csv file)
Theory:
Attach screenshot wherever necessary
What are DS, numpy, pandas,What is dataframe, what is EDA?
DS (Data Science): Interdisciplinary field that uses methods, processes, algorithms, and systems
to extract knowledge from structured/unstructured data.
NumPy: A library for numerical computations in Python; supports multi-dimensional arrays and
matrices.
Pandas: A library for data manipulation and analysis; it uses Data-Frames as a core data
structure.
Data-Frame: A 2D labelled data structure with columns of potentially different types (like an
Excel sheet).
EDA (Exploratory Data Analysis): The process of analysing datasets to summarize their main
characteristics.
3
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
how to read_csv file (answer this for the platform which you are using)
How to create dataframe.
how to select one/multiple column/s from dataset.
Single column:
Multiple Columns:
4
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Use of df.columns, head(),tail(), df.dtypes,value_counts(), isnull(),sum(), df.size, len(df),
df.count, describe(), max(), df.shape
5
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
6
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
7
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Create Array and perform operations and display properties.
8
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Performing indexing operations on array
9
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Performing basic slicing on array
Link of Execution:
https://colab.research.google.com/drive/1PKQe9IOo_pJ2PqaEtT8lG0UfLdXYiECo?usp=sharing
Conclusion:
Successfully added an index field to the dataset.
Detected and handled missing values.
Identified outliers and printed 10 largest values in a specific field.
Applied standardization using z-score.
Most extreme outlier and count of total outliers were successfully determined.
10
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Title:
Exp 2 : Accessing data frame using iloc and loc.
Problem Statement:
How to accessing data using iloc.
How to accessing data using loc.
How to access specific rows and columns.
Platform Used:
Google colab
Name of Dataset:
MamalsData.csv
Theory:
iloc in Python, specifically within the pandas library, is used for integer-location based indexing
in DataFrames and Series. It allows the selection of data by specifying the integer positions of
rows and columns. iloc uses a 0-based indexing system, meaning the first row or column is at
position 0. It can accept single integers, lists of integers, or slices to specify the desired rows and
columns. The endpoint of the slice is exclusive.
loc is used in the pandas library in Python to access and manipulate data in a DataFrame using
labels (row and column names). It enables the selection of data based on the specified labels,
offering flexibility in retrieving and modifying subsets of the DataFrame.
Attach screenshot wherever necessary
Basic iloc function
when selecting rows/columns by position (numbers) we use iloc function.
11
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Accessing first row index
12
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Accessing rows from index 1 to 2
Accessing second column
Accessing row of index 2
13
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Basic loc function
when selecting by label or applying conditions.
Accessing rows from index 1 to 2
Accessing column with label ‘State’
14
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Accessing the value in row with label '2' and column 'State' using loc
Link of Execution:
https://colab.research.google.com/drive/1xFK4wMxX7VxvRHR21GlmPVqnB14_Wa3-?
usp=sharing
Conclusion:
In this experiment, we successfully learned how to access data in a DataFrame using both iloc
and loc functions. The iloc function is useful for retrieving data based on integer index positions,
while loc allows access based on labeled indices and column names. Understanding the
difference between these two methods is essential for efficient data manipulation and analysis in
pandas. This experiment helped reinforce the concepts of indexing and slicing, which are
foundational in data preprocessing tasks.
15
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Title:
Exp 3 : Data Visualization / Exploratory Data Analysis for the selected data set using Matplotlib
and Seaborn
Problem Statement:
a. Create a bar graph, contingency table using any 2 variables.
b. Create normalized histogram.
c. Describe what this graphs and tables indicates?
Platform Used:
Name of Dataset:
Theory:
1. What is Visualization?
2. What is and use of Seaborn and Matplotlib?
Attach screenshot wherever necessary
1. What is visualization
Ans- Data visualization is the graphical representation of information and data using visual
elements like charts, graphs, and maps. It helps in understanding trends, patterns, and outliers
in data.
2. Explain what is and use of seaborn and matplotlib.
Ans- Matplotlib is a low-level graph plotting library in Python that offers complete control
over plot elements.
Seaborn is built on top of Matplotlib and provides a high-level interface for drawing
attractive and informative statistical graphics.
Uses:
Visualize the distribution of data.
16
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Show relationships between multiple variables.
Summarize complex data using colour patterns and statistical estimations.
3. List down Data Visualization Plots
Ans-
Bar Graph
Histogram
Line Chart
Box Plot
Violin Plot
Scatter Plot
Heatmap
Pair Plot
Pie Chart
4. Explain what is bar graph, histogram and contingency table and heatmap, mention when
to use these plots also uni/bi/multi variant.
Ans-
Plot Types Description When to use Variant type
Bar Graph Displays categorical Comparing Univariate or
data with rectangular categories Bivariate
bars.
Histogram Shows frequency Distribution of one Univariate
distribution of variable
numerical data.
Contingency Table Shows frequency Relationship Bivariate
distribution for two between categories
categorical variables.
Heatmap Graphical Highlight patterns or Bivariate or
representation of correlation Multivariate
data in matrix form
with colors.
17
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Screenshots of Code with Output:
Scatter plot
Bar Graph
18
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
HeatMap
19
Don Bosco Institute of Technology, Kurla(W), Mumbai
Department of Information Technology
Data Science using Python lab TE_IT 2025(C scheme)
A.Y. 2024-25 SEM VI
Conclusion:
In this experiment, we explored how to visualize and analyze data using Matplotlib and Seaborn.
We successfully created bar graphs, contingency tables, and normalized histograms. These
visualizations provided insights into category-wise distributions and continuous variable
patterns, helping us better understand our dataset. Mastering such visual tools is essential for any
data analysis or machine learning task.
20