0% found this document useful (0 votes)

18 views30 pages

It Journal

The document outlines various practical exercises related to data visualization and analysis using Python libraries such as Pandas, Matplotlib, and Seaborn. It covers topics including data collection, data frames, file handling, data cleaning, statistical analysis, and visualization techniques. Each practical section provides code examples and expected outputs for better understanding.

Uploaded by

rambhiatwinkle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views30 pages

It Journal

Uploaded by

rambhiatwinkle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Data Visualization and Analysis

Contents

S. No. Topic
1 Demonstrate data collection structure using series.
2 Create data frames and perform different operations on it.
3 Creating a Panel from a dictionary of Data Frame Objects and
performing their operations.
4 Write a program to perform file handling operations.
5 Perform data cleaning and pre-processing with different files.
6 Write a program to handle missing and outlier values.
7 Demonstrate Data exploration on series and data frame.
8 Write a program for statistical analysis.
9 Visualize data through Matplotlib.
10 Visualize data through Seaborn plotting.
Practical 1: Demonstrate data collection structure using series.

Code

import pandas as pd

data = [10,20,30,40]

index = ['A','B','C','D']

series = pd.Series(data, index=index)

print(series)

#Selecting an element

print(series['C'])

Output

Code using Dictionary

import pandas as pd

data_dict = {'A': 10, 'B': 20, 'C': 30}

series = pd.Series(data_dict)

print(series)
Practical 2: Create data frames and perform different operations
on it.

Code :

import pandas as pd

data = {

'Name':['Ashu','Naman','Datta'],

'Age': [20,20,21],

'City': ['Mumbai', 'Dehradun', 'Noida']

df = pd.DataFrame(data)

print(df)

print(df.iloc[0][0])

print(df.loc[1]['Name'])

print(df.at[2,'Name'])

print(df.iat[2,0])

df.set_index('Name')

df.drop('City', axis = 1)
# Access a specific column

print(df['Age'])
Code to Transpose

import pandas as pd

data = {'Name': ['John', 'Emily', 'Kate', 'Sam'],

'Age': [30, 35, 25, 32],

'City': ['New York', 'San Francisco', 'Chicago', 'Boston']}

df = pd.DataFrame(data)

print("Original DataFrame:")

print(df)

transposed_df = df.T

print("\nTransposed DataFrame:")

print(transposed_df)
Practical 3: Creating a Panel from a dictionary of Data Frame
Objects and performing their operations.

import pandas as pd

import numpy as np

# Create a Panel

data = {'Item1': pd.DataFrame(np.random.randn(4, 3)),

'Item2': pd.DataFrame(np.random.randn(4, 3))}

panel = pd.panel(data)

# Display the original Panel

print("Original Panel:")

print(panel)

# Add a new item to the Panel

panel['Item3'] = pd.DataFrame(np.random.randn(4, 3))

# Display the Panel after adding the item

print("\nPanel after adding the item:")

print(panel)

# Delete an item from the Panel

panel = panel.drop('Item2', axis=0)

# Display the Panel after deleting the item

print("\nPanel after deleting the item:")

print(panel)
Practical 4: Write a program to handle file handling operation

Step 1. Create an empty Text File and save it.

Step 2. Open your IDE, and and enter the code

Code:

# Open a file in Write mode ('w', or 'wt' for texrt mode)

file = open('hello.txt', 'w')

file.write("hello this is a sample file \n")

file.write("this program is to demonstrate \n")

file.write("File handling in Python \n")

file.close()

# Open the file in read mode

file = open('hello.txt', 'r')

x = file.read()
print("the content of the file is: ")

print(x)

file.close()

# Open the file in Append mode

file = open('hello.txt', 'a')

file.write("This is append")

file.close()

# Open the file in read mode to see the updated content

file = open('hello.txt','r')

y = file.read()

print("The updated content is : ")

print(y)

Output :
Practical 5:Perform data cleaning and pre-processing with
different files.

Step 1. Open Kaggle, and Sign in

Step 2. Download the sample Superstore dataset from there for

Analysis.

Step 3. Open the notebook, and in Data, add the Superstore data
set.

Code
Output
Practical 6 :Write a program to handle missing and outlier values.
import numpy as np
import pandas as pd

data={'Feature1':[1,2,3,np.nan,5,6,7,8,9,10],
'Feature2': [10,20,30,40,50,60,70,80,90,100]}

df=pd.DataFrame(data)

df['Feature1'].fillna(df['Feature1'].mean(), inplace=True)

from scipy import stats

z_scores=np.abs(stats.zscore(df))
threshold=3

outliers = np.where(z_scores > threshold)

df_no_outliers = df[(z_scores < threshold).all(axis=1)]

print("Original DataFrame:")
print(df)
print("\nDataFrame with missing values handles:")
print(df_no_outliers)

Output
Practical 7 : Demonstrate Data exploration on series and data
frame.

Code:

import pandas as pd

import numpy as np

# Create a sample dataset

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],

'Age': [25, 30, 22, 35, 28],

'Salary': [50000, 60000, 45000, 70000, 55000],

'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago',

'Boston']

# Create a DataFrame from the dataset

df = pd.DataFrame(data)

# Display the DataFrame

print("Original DataFrame:")

print(df)

print("\n")

# Data exploration on Series

age_series = df['Age']

# Basic statistics on the Age series

print("Summary statistics for Age:")

print(age_series.describe())

print("\n")

# Check for missing values in the Age series

print("Missing values in Age:")

print(age_series.isnull().sum())

print("\n")

# Data exploration on DataFrame

# Summary statistics for the entire DataFrame

print("Summary statistics for the DataFrame:")

print(df.describe())

print("\n")
# Check data types before calculating correlation matrix

numeric_columns = df.select_dtypes(include=[np.number]).columns

if len(numeric_columns) > 1:

# Correlation matrix for the DataFrame

print("Correlation matrix for the DataFrame:")

print(df[numeric_columns].corr())

print("\n")

else:

print("No numeric columns for correlation calculation.")

# Count the occurrences of each unique value in the City column

print("Count of unique values in the City column:")

print(df['City'].value_counts())

print("\n")

# Grouping by City and calculating mean salary

grouped_by_city = df.groupby('City')['Salary'].mean()

print("Mean salary grouped by City:")

print(grouped_by_city)

Output →
Practical 8 : Write a program to perform statistical analysis .

Code
import numpy as np
from scipy import stats

data = [34,45,32,48,22,55,36,38,40,28,60]

mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
var = np.var(data)
min_value = min(data)
max_value = max(data)
range_value = max_value - min_value

t_statistic, p_value = stats.ttest_1samp(data,popmean=40)

print("Data:",data)
print("Mean",mean)
print("Median:",median)
print("Standard Deviation:",std_dev)
print("Variance:",var)
print("Minimun Vlaue:",min_value)
print("Maximum Value:",max_value)
print("Range:",range_value)
print("T-Statistic:",t_statistic)
print("P-Value:",p_value)

shapiro_stat, shapiro_p = stats.shapiro(data)

if shapiro_p>0.05:
print("Data is normally distributed (Shapiro-Wilk test p-value=
",shapiro_p,")")
else:
print("Data is not normally distributed (Shapiro-Wilk test p-value=
",shapiro_p,")")

Output
Practical 9: Visualize data through Matplotlib
Step 1: Install the necessary libraries, using pip

Code
# importing the required libraries
import matplotlib.pyplot as plt
import numpy as np

# define data values

x = np.array([1, 2, 3, 4]) # X-axis points
y = x*2 # Y-axis points

plt.plot(x, y) # Plot the chart

plt.show() # display

Output
Code
from matplotlib import pyplot as plt

x=[5,2,9,4,7]
y=[10,5,8,4,2]
plt.bar(x,y)
plt.show()

Output
Code for Scatter Plot

import matplotlib.pyplot as plt

import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])

plt.scatter(x, y)
plt.show()
Practical 10: Visualize data through Seaborn
Code :
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("iris")

# draw lineplot
sns.lineplot(x="sepal_length", y="sepal_width", data=data)
plt.show()

Output:

Code:

DV Journal
No ratings yet
DV Journal
30 pages
Informatics Practices Guide
100% (1)
Informatics Practices Guide
32 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
Ids 1
No ratings yet
Ids 1
30 pages
Final Ip Practical File
No ratings yet
Final Ip Practical File
29 pages
IP Practical 2024-25 (1 To 34)
No ratings yet
IP Practical 2024-25 (1 To 34)
33 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
Practical Record File X - DS
No ratings yet
Practical Record File X - DS
12 pages
Ip Practical File
No ratings yet
Ip Practical File
23 pages
Practical File Outline: Series and Dataframe: Pip Install Pandas Import Pandas As PD
No ratings yet
Practical File Outline: Series and Dataframe: Pip Install Pandas Import Pandas As PD
3 pages
12th Practical
No ratings yet
12th Practical
21 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
National Public School: Name-Karan Choudhary Class-XII Subject - Informatics Practices (065) Board Roll No.
No ratings yet
National Public School: Name-Karan Choudhary Class-XII Subject - Informatics Practices (065) Board Roll No.
24 pages
Even Students
No ratings yet
Even Students
36 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Data Science & Analytics Lab Manual
No ratings yet
Data Science & Analytics Lab Manual
39 pages
IP Record Python 23-24 Aryan
No ratings yet
IP Record Python 23-24 Aryan
42 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
Shubham Info Practical 3251
No ratings yet
Shubham Info Practical 3251
59 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Informatics Practices Practical Guide
No ratings yet
Informatics Practices Practical Guide
17 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Grade 12 - IP Practicals (1 To 9)
No ratings yet
Grade 12 - IP Practicals (1 To 9)
12 pages
12 IP Practial Programs 2025-26
No ratings yet
12 IP Practial Programs 2025-26
10 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
National Public School: Name-Mohit Kumar Class-XII Subject - Informatics Practices (065) Board Roll No.
No ratings yet
National Public School: Name-Mohit Kumar Class-XII Subject - Informatics Practices (065) Board Roll No.
35 pages
Final Practical File 2022-23
No ratings yet
Final Practical File 2022-23
87 pages
Pandas Practicals - Term-1
100% (1)
Pandas Practicals - Term-1
18 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
ML IU48prac1,2
No ratings yet
ML IU48prac1,2
16 pages
Class 12 Informatics Practices Practical File
No ratings yet
Class 12 Informatics Practices Practical File
35 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
Practical 1
No ratings yet
Practical 1
5 pages
Practicals 1 To 4
No ratings yet
Practicals 1 To 4
15 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
Ip Practical
No ratings yet
Ip Practical
31 pages
Practical File-Python
No ratings yet
Practical File-Python
14 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Sanyam Data Science
No ratings yet
Sanyam Data Science
33 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Ip Practical 2024 25 1 To 34
No ratings yet
Ip Practical 2024 25 1 To 34
32 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
16 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Aanik Info Practical 3261
No ratings yet
Aanik Info Practical 3261
61 pages
Da Pra Week-8 (Karthik S) - 074713
No ratings yet
Da Pra Week-8 (Karthik S) - 074713
9 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
XII IP Practical File
0% (1)
XII IP Practical File
52 pages
12 IP Practical Exampl
No ratings yet
12 IP Practical Exampl
6 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
12th Practical
No ratings yet
12th Practical
63 pages
Practical 1
No ratings yet
Practical 1
9 pages
Lecture Notes 11
No ratings yet
Lecture Notes 11
41 pages
TEDS Import File Format v1.9 (052319) - MSC
No ratings yet
TEDS Import File Format v1.9 (052319) - MSC
11 pages
NKS Integration for Diva Users
No ratings yet
NKS Integration for Diva Users
6 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
22 pages
100 Unix Commands
No ratings yet
100 Unix Commands
8 pages
Detailed Bill Jan 2024
No ratings yet
Detailed Bill Jan 2024
18 pages
List of Micro Worlds Logo Commands
No ratings yet
List of Micro Worlds Logo Commands
66 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
30 pages
Python Programming Lab Manual
No ratings yet
Python Programming Lab Manual
48 pages
Minecraft Debugging for ARM Devices
No ratings yet
Minecraft Debugging for ARM Devices
13 pages
VideoCleaner Users Guide
No ratings yet
VideoCleaner Users Guide
28 pages
Hyperion Data Processing Instructions
No ratings yet
Hyperion Data Processing Instructions
49 pages
CBSE Practicals
No ratings yet
CBSE Practicals
37 pages
Commands
No ratings yet
Commands
7 pages
Kako Se Pravi Changes
No ratings yet
Kako Se Pravi Changes
3 pages
Export Webdynpro Table to CSV Guide
100% (1)
Export Webdynpro Table to CSV Guide
3 pages
Execution
No ratings yet
Execution
2 pages
Forensics Lab Experiment: Task1 - Recovering Jpeg File
No ratings yet
Forensics Lab Experiment: Task1 - Recovering Jpeg File
5 pages
A Modest Proposal
No ratings yet
A Modest Proposal
9 pages
Cs Record
No ratings yet
Cs Record
33 pages
Comprehensive Guide On Crunch Tool
No ratings yet
Comprehensive Guide On Crunch Tool
1 page
Chapter 3
No ratings yet
Chapter 3
12 pages
Text Files Worksheet-XII
No ratings yet
Text Files Worksheet-XII
3 pages
Python Programming Exercises
No ratings yet
Python Programming Exercises
9 pages
File Handling Questions 1
No ratings yet
File Handling Questions 1
2 pages
Unit Test - 1 Feb 26
No ratings yet
Unit Test - 1 Feb 26
5 pages
Lab02 PowerShell - Basics
No ratings yet
Lab02 PowerShell - Basics
4 pages
GSM Pager3 Z6: Installation and User Manual
No ratings yet
GSM Pager3 Z6: Installation and User Manual
20 pages
File Type Detection Technology
No ratings yet
File Type Detection Technology
12 pages

It Journal

Uploaded by

It Journal

Uploaded by

Data Visualization and Analysis

series = pd.Series(data, index=index)

Code using Dictionary

data_dict = {'A': 10, 'B': 20, 'C': 30}

'City': ['Mumbai', 'Dehradun', 'Noida']

data = {'Name': ['John', 'Emily', 'Kate', 'Sam'],

'Age': [30, 35, 25, 32],

'City': ['New York', 'San Francisco', 'Chicago', 'Boston']}

data = {'Item1': pd.DataFrame(np.random.randn(4, 3)),

'Item2': pd.DataFrame(np.random.randn(4, 3))}

# Display the original Panel

# Add a new item to the Panel

panel['Item3'] = pd.DataFrame(np.random.randn(4, 3))

# Display the Panel after adding the item

# Delete an item from the Panel

panel = panel.drop('Item2', axis=0)

# Display the Panel after deleting the item

print("\nPanel after deleting the item:")

Step 1. Create an empty Text File and save it.

Step 2. Open your IDE, and and enter the code

# Open a file in Write mode ('w', or 'wt' for texrt mode)

file = open('hello.txt', 'w')

file.write("hello this is a sample file \n")

file.write("this program is to demonstrate \n")

file.write("File handling in Python \n")

# Open the file in read mode

file = open('hello.txt', 'r')

# Open the file in Append mode

file = open('hello.txt', 'a')

# Open the file in read mode to see the updated content

print("The updated content is : ")

Step 1. Open Kaggle, and Sign in

Step 2. Download the sample Superstore dataset from there for

from scipy import stats

outliers = np.where(z_scores > threshold)

df_no_outliers = df[(z_scores < threshold).all(axis=1)]

# Create a sample dataset

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],

'Age': [25, 30, 22, 35, 28],

'Salary': [50000, 60000, 45000, 70000, 55000],

'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago',

# Create a DataFrame from the dataset

# Display the DataFrame

# Data exploration on Series

# Basic statistics on the Age series

print("Summary statistics for Age:")

# Check for missing values in the Age series

print("Missing values in Age:")

# Data exploration on DataFrame

# Summary statistics for the entire DataFrame

print("Summary statistics for the DataFrame:")

# Correlation matrix for the DataFrame

print("Correlation matrix for the DataFrame:")

print("No numeric columns for correlation calculation.")

# Count the occurrences of each unique value in the City column

print("Count of unique values in the City column:")

# Grouping by City and calculating mean salary

print("Mean salary grouped by City:")

t_statistic, p_value = stats.ttest_1samp(data,popmean=40)

shapiro_stat, shapiro_p = stats.shapiro(data)

# define data values

plt.plot(x, y) # Plot the chart

import matplotlib.pyplot as plt

You might also like