0% found this document useful (0 votes)
18 views30 pages

It Journal

The document outlines various practical exercises related to data visualization and analysis using Python libraries such as Pandas, Matplotlib, and Seaborn. It covers topics including data collection, data frames, file handling, data cleaning, statistical analysis, and visualization techniques. Each practical section provides code examples and expected outputs for better understanding.

Uploaded by

rambhiatwinkle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views30 pages

It Journal

The document outlines various practical exercises related to data visualization and analysis using Python libraries such as Pandas, Matplotlib, and Seaborn. It covers topics including data collection, data frames, file handling, data cleaning, statistical analysis, and visualization techniques. Each practical section provides code examples and expected outputs for better understanding.

Uploaded by

rambhiatwinkle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Data Visualization and Analysis

Contents

S. No. Topic
1 Demonstrate data collection structure using series.
2 Create data frames and perform different operations on it.
3 Creating a Panel from a dictionary of Data Frame Objects and
performing their operations.
4 Write a program to perform file handling operations.
5 Perform data cleaning and pre-processing with different files.
6 Write a program to handle missing and outlier values.
7 Demonstrate Data exploration on series and data frame.
8 Write a program for statistical analysis.
9 Visualize data through Matplotlib.
10 Visualize data through Seaborn plotting.
Practical 1: Demonstrate data collection structure using series.

Code

import pandas as pd

data = [10,20,30,40]

index = ['A','B','C','D']

series = pd.Series(data, index=index)

print(series)

#Selecting an element

print(series['C'])

Output

Code using Dictionary


import pandas as pd

data_dict = {'A': 10, 'B': 20, 'C': 30}

series = pd.Series(data_dict)

print(series)
Practical 2: Create data frames and perform different operations
on it.

Code :

import pandas as pd

data = {

'Name':['Ashu','Naman','Datta'],

'Age': [20,20,21],

'City': ['Mumbai', 'Dehradun', 'Noida']

df = pd.DataFrame(data)

print(df)

print(df.iloc[0][0])

print(df.loc[1]['Name'])

print(df.at[2,'Name'])

print(df.iat[2,0])

df.set_index('Name')

df.drop('City', axis = 1)
# Access a specific column

print(df['Age'])
Code to Transpose

import pandas as pd

data = {'Name': ['John', 'Emily', 'Kate', 'Sam'],

'Age': [30, 35, 25, 32],

'City': ['New York', 'San Francisco', 'Chicago', 'Boston']}

df = pd.DataFrame(data)

print("Original DataFrame:")

print(df)

transposed_df = df.T

print("\nTransposed DataFrame:")

print(transposed_df)
Practical 3: Creating a Panel from a dictionary of Data Frame
Objects and performing their operations.

import pandas as pd

import numpy as np

# Create a Panel

data = {'Item1': pd.DataFrame(np.random.randn(4, 3)),

'Item2': pd.DataFrame(np.random.randn(4, 3))}

panel = pd.panel(data)

# Display the original Panel

print("Original Panel:")

print(panel)

# Add a new item to the Panel

panel['Item3'] = pd.DataFrame(np.random.randn(4, 3))

# Display the Panel after adding the item


print("\nPanel after adding the item:")

print(panel)

# Delete an item from the Panel

panel = panel.drop('Item2', axis=0)

# Display the Panel after deleting the item

print("\nPanel after deleting the item:")

print(panel)
Practical 4: Write a program to handle file handling operation

Step 1. Create an empty Text File and save it.

Step 2. Open your IDE, and and enter the code

Code:

# Open a file in Write mode ('w', or 'wt' for texrt mode)

file = open('hello.txt', 'w')

file.write("hello this is a sample file \n")

file.write("this program is to demonstrate \n")

file.write("File handling in Python \n")

file.close()

# Open the file in read mode

file = open('hello.txt', 'r')

x = file.read()
print("the content of the file is: ")

print(x)

file.close()

# Open the file in Append mode

file = open('hello.txt', 'a')

file.write("This is append")

file.close()

# Open the file in read mode to see the updated content

file = open('hello.txt','r')

y = file.read()

print("The updated content is : ")

print(y)

Output :
Practical 5:Perform data cleaning and pre-processing with
different files.

Step 1. Open Kaggle, and Sign in

Step 2. Download the sample Superstore dataset from there for


Analysis.

Step 3. Open the notebook, and in Data, add the Superstore data
set.

Code
Output
Practical 6 :Write a program to handle missing and outlier values.
import numpy as np
import pandas as pd

data={'Feature1':[1,2,3,np.nan,5,6,7,8,9,10],
'Feature2': [10,20,30,40,50,60,70,80,90,100]}

df=pd.DataFrame(data)

df['Feature1'].fillna(df['Feature1'].mean(), inplace=True)

from scipy import stats

z_scores=np.abs(stats.zscore(df))
threshold=3

outliers = np.where(z_scores > threshold)

df_no_outliers = df[(z_scores < threshold).all(axis=1)]

print("Original DataFrame:")
print(df)
print("\nDataFrame with missing values handles:")
print(df_no_outliers)

Output
Practical 7 : Demonstrate Data exploration on series and data
frame.

Code:

import pandas as pd

import numpy as np

# Create a sample dataset

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],

'Age': [25, 30, 22, 35, 28],

'Salary': [50000, 60000, 45000, 70000, 55000],

'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago',


'Boston']

# Create a DataFrame from the dataset

df = pd.DataFrame(data)

# Display the DataFrame

print("Original DataFrame:")

print(df)

print("\n")

# Data exploration on Series


age_series = df['Age']

# Basic statistics on the Age series

print("Summary statistics for Age:")

print(age_series.describe())

print("\n")

# Check for missing values in the Age series

print("Missing values in Age:")

print(age_series.isnull().sum())

print("\n")

# Data exploration on DataFrame

# Summary statistics for the entire DataFrame

print("Summary statistics for the DataFrame:")

print(df.describe())

print("\n")
# Check data types before calculating correlation matrix

numeric_columns = df.select_dtypes(include=[np.number]).columns

if len(numeric_columns) > 1:

# Correlation matrix for the DataFrame

print("Correlation matrix for the DataFrame:")

print(df[numeric_columns].corr())

print("\n")

else:

print("No numeric columns for correlation calculation.")

# Count the occurrences of each unique value in the City column

print("Count of unique values in the City column:")

print(df['City'].value_counts())

print("\n")

# Grouping by City and calculating mean salary

grouped_by_city = df.groupby('City')['Salary'].mean()

print("Mean salary grouped by City:")


print(grouped_by_city)

Output →
Practical 8 : Write a program to perform statistical analysis .

Code
import numpy as np
from scipy import stats

data = [34,45,32,48,22,55,36,38,40,28,60]

mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
var = np.var(data)
min_value = min(data)
max_value = max(data)
range_value = max_value - min_value

t_statistic, p_value = stats.ttest_1samp(data,popmean=40)

print("Data:",data)
print("Mean",mean)
print("Median:",median)
print("Standard Deviation:",std_dev)
print("Variance:",var)
print("Minimun Vlaue:",min_value)
print("Maximum Value:",max_value)
print("Range:",range_value)
print("T-Statistic:",t_statistic)
print("P-Value:",p_value)

shapiro_stat, shapiro_p = stats.shapiro(data)


if shapiro_p>0.05:
print("Data is normally distributed (Shapiro-Wilk test p-value=
",shapiro_p,")")
else:
print("Data is not normally distributed (Shapiro-Wilk test p-value=
",shapiro_p,")")

Output
Practical 9: Visualize data through Matplotlib
Step 1: Install the necessary libraries, using pip

Code
# importing the required libraries
import matplotlib.pyplot as plt
import numpy as np

# define data values


x = np.array([1, 2, 3, 4]) # X-axis points
y = x*2 # Y-axis points

plt.plot(x, y) # Plot the chart


plt.show() # display

Output
Code
from matplotlib import pyplot as plt

x=[5,2,9,4,7]
y=[10,5,8,4,2]
plt.bar(x,y)
plt.show()

Output
Code for Scatter Plot

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])

plt.scatter(x, y)
plt.show()
Practical 10: Visualize data through Seaborn
Code :
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("iris")

# draw lineplot
sns.lineplot(x="sepal_length", y="sepal_width", data=data)
plt.show()

Output:

Code:

You might also like