0% found this document useful (0 votes)
35 views19 pages

DVA Practical

The document provides a comprehensive introduction to data visualization, emphasizing its importance in analytics and detailing common chart types such as bar, line, scatter, histogram, box plot, and pie chart. It also covers tools and libraries like Matplotlib, Seaborn, and Plotly for creating various visualizations, alongside techniques for dataset loading, exploration, cleaning, and preparation. Additionally, it discusses advanced visualization techniques, multivariate analysis, and time series analysis using real-world datasets.

Uploaded by

laxmipriya1521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

DVA Practical

The document provides a comprehensive introduction to data visualization, emphasizing its importance in analytics and detailing common chart types such as bar, line, scatter, histogram, box plot, and pie chart. It also covers tools and libraries like Matplotlib, Seaborn, and Plotly for creating various visualizations, alongside techniques for dataset loading, exploration, cleaning, and preparation. Additionally, it discusses advanced visualization techniques, multivariate analysis, and time series analysis using real-world datasets.

Uploaded by

laxmipriya1521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Q1.

Introduction to Data Visualization


• Understand the importance of data visualization in analytics.
• Overview of common chart types: bar, line, scatter, histogram, box plot, pie chart.

Answer:

What is Data Visualization?

Data visualization is the graphical representation of information and data. It helps in:

• Understanding patterns and trends in the data

• Communicating insights clearly and effectively

• Making data-driven decisions

Why is it Important?

• Simplifies complex data

• Reveals patterns that aren't obvious in raw data

• Helps detect outliers and anomalies

• Facilitates storytelling with data

Common Chart Types import pandas as pd

import matplotlib.pyplot as plt import seaborn as

sns

# Load dataset df = pd.read_csv('titanic.csv')

# Preview data df.head()

1. Bar Chart

Use: To compare quantities across categories.

# Count of passengers by class sns.countplot(data=df, x='Pclass')

plt.title('Passenger Count by Class') plt.xlabel('Class')

plt.ylabel('Count') plt.show()

2. Line Chart

Use: To track changes over time. # Simulate some time data df['PassengerId'] =

pd.to_datetime(df['PassengerId'], unit='D', origin='1900-01-01')

df.groupby(df['PassengerId'].dt.year)['Fare'].mean().plot() plt.title('Average Fare Over Time')

plt.xlabel('Year') plt.ylabel('Average Fare') plt.show()


3. Scatter Plot

Use: To show relationship between two numeric variables.

sns.scatterplot(data=df, x='Age', y='Fare') plt.title('Age vs

Fare') plt.show()

4. Histogram
Use: To view the distribution of a single numeric variable.

sns.histplot(data=df, x='Age', bins=30, kde=True) plt.title('Age

Distribution') plt.show()

5. Box Plot

Use: To show distribution and detect outliers.

sns.boxplot(data=df, x='Pclass', y='Age') plt.title('Age

Distribution by Class') plt.show()

6. Pie Chart

Use: To show proportion. # Pie chart of survival survived_counts =

df['Survived'].value_counts() labels = ['Not Survived', 'Survived'] plt.pie(survived_counts,

labels=labels, autopct='%1.1f%%', startangle=140) plt.title('Survival Rate') plt.axis('equal')

plt.show()
Q2. Tools and Libraries for Visualization

• Introduction to Python libraries: Matplotlib, Seaborn, and Plotly.


• Install necessary libraries and understand their use cases.

Answer:

Library Use Case Strengths

Matplotlib Base library for all plots Highly customizable, good for static charts

Seaborn Statistical visualization Clean, attractive default themes, simplifies complex plots

Plotly Interactive plots Great for dashboards and web apps

Installing the Libraries

Open your terminal or Jupyter Notebook and install the following:

pip install matplotlib seaborn plotly

1. Matplotlib – The Foundation

Overview: It’s the base library used to create static, animated, and interactive plots in Python. import matplotlib.pyplot

as plt

# Simple line chart x = [1, 2, 3, 4] y =

[10, 20, 25, 30]

plt.plot(x, y) plt.title("Simple Line Plot")

plt.xlabel("X-axis") plt.ylabel("Y-axis")

plt.grid(True) plt.show()

2. Seaborn – Built on Matplotlib

Overview: Makes it easier to create beautiful and informative statistical plots.

import seaborn as sns import pandas as pd

# Load example dataset df =

sns.load_dataset('tips')

# Seaborn scatter plot sns.scatterplot(data=df, x='total_bill', y='tip',

hue='sex') plt.title("Total Bill vs Tip by Gender") plt.show()


3. Plotly – For Interactive Plots

Overview: Best for interactive, zoomable, and hoverable plots. Excellent for web apps and dashboards.

import plotly.express as px

# Load built-in dataset df = px.data.iris()

# Interactive scatter plot fig = px.scatter(df, x='sepal_width', y='sepal_length', color='species', title="Iris Sepal

Dimensions") fig.show()

Note: Plotly works in Jupyter Notebooks and browser-based apps by default. No need for plt.show().
Q3. Dataset Loading and Exploration
• Load real-world datasets using Pandas.
• Use .head(), .tail(), .info(), .describe() to explore data.

Answer:

Loading a Dataset import pandas as pd

# Load Titanic dataset df =

pd.read_csv("titanic.csv")

# Show the first 5 rows df.head()

Exploring the Dataset .head() – View the first few

rows df.head(3) # First 3 rows

.tail() – View the last few rows df.tail(3) # Last 3 rows

.info() – Overview of columns, data types, non-null counts df.info()

.describe() – Summary statistics for numeric columns df.describe()


Q4. Understanding Variable Types

• Differentiate between categorical, numerical, discrete, and continuous variables.


• Identify types of variables in a dataset.

Answer:

Types of Variables
Type Description Examples

Categorical Represent categories or groups Gender, Class, Embarked

Numerical Represent measurable quantities Age, Fare

➤ Discrete Countable values (integers) Number of siblings, Pclass


➤ Continuous Measurable values (fractions allowed) Age, Fare

Let's Work with the Titanic Dataset import pandas as

pd # Load dataset df = pd.read_csv('titanic.csv')

df.head()

Identify Variable Types # Check data types

df.dtypes
Q5. Data Cleaning and Preparation for Visualization
• Handle missing values, remove duplicates, and convert data types.
• Prepare clean data for analysis and plotting.

Answer:

Step 1: Handling Missing Values Identify Missing Values

df.isnull().sum()

Drop or Fill Missing Values

Drop missing rows (when too many nulls or rows aren't crucial):

df_cleaned = df.dropna(subset=['Embarked'])

Fill missing values (with mean, median, or mode): df['Age'].fillna(df['Age'].median(), inplace=True)

df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

Step 2: Removing Duplicates # Check and remove

duplicates print("Duplicates:", df.duplicated().sum())

df.drop_duplicates(inplace=True)

Step 3: Convert Data Types

Ensure columns are in correct format: # Convert Survived to

category df['Survived'] = df['Survived'].astype('category')

# Convert Embarked to category df['Embarked'] =

df['Embarked'].astype('category')

# Confirm changes df.dtypes

Clean Data Ready! # Final check

print(df.info()) print(df.isnull().sum())
Q6. Creating Basic Plots Using Matplotlib

• Plot line charts, bar charts, histograms using Matplotlib.

• Customize plots with titles, labels, legends, and colors.

Answer:

import pandas as pd import matplotlib.pyplot as

plt

# Load dataset df = pd.read_csv("titanic.csv")

1. Line Chart # Average fare by class fare_by_class =

df.groupby('Pclass')['Fare'].mean()

# Plot line chart plt.plot(fare_by_class.index, fare_by_class.values, color='green', marker='o', linestyle='--')

plt.title('Average Fare by Passenger Class') plt.xlabel('Passenger Class') plt.ylabel('Average Fare')

plt.grid(True) plt.xticks([1, 2, 3]) plt.show()

2. Bar Chart

# Count of passengers per class class_counts =

df['Pclass'].value_counts().sort_index()

# Bar chart plt.bar(class_counts.index, class_counts.values, color=['skyblue', 'salmon', 'lightgreen'])

plt.title('Passenger Count by Class') plt.xlabel('Passenger Class')

plt.ylabel('Count') plt.xticks([1, 2, 3]) plt.show()

3. Histogram

# Drop missing values in 'Age' ages =

df['Age'].dropna()

# Histogram plt.hist(ages, bins=20, color='purple', edgecolor='black')

plt.title('Age Distribution of Passengers') plt.xlabel('Age')

plt.ylabel('Frequency') plt.grid(axis='y', alpha=0.5)

plt.show()
Q7. Advanced Visualization Using Seaborn

• Create scatter plots, box plots, violin plots, and pair plots.

• Use hue, style, and palette for deeper analysis.

Answer:

import seaborn as sns import pandas as pd

import matplotlib.pyplot as plt

# Load Titanic dataset df = sns.load_dataset('titanic') # built-in

dataset

1. Scatter Plot sns.scatterplot(data=df, x='age', y='fare', hue='sex', style='class',

palette='Set2')

plt.title("Age vs Fare by Gender and Class") plt.show()

2. Box Plot sns.boxplot(data=df, x='class', y='age', hue='sex', palette='coolwarm')

plt.title("Age Distribution by Class and Gender") plt.show()

3. Violin Plot sns.violinplot(data=df, x='class', y='age', hue='sex', split=True,

palette='muted')

plt.title("Age Distribution by Class and Gender (Violin Plot)") plt.show()

4. Pair Plot sns.pairplot(df[['age', 'fare', 'survived', 'sex']], hue='sex', palette='husl')

plt.suptitle("Pairwise Relationships", y=1.02) plt.show()


Q8. Multivariate Analysis with Seaborn

• Heatmaps and correlation matrices to analyze relationships between multiple variables.

• Apply sns.heatmap() and sns.pairplot().

Answer:

import seaborn as sns import pandas as pd

import matplotlib.pyplot as plt

# Load dataset df = sns.load_dataset('titanic')

1. Correlation Matrix # Select numeric columns only num_df =

df.select_dtypes(include='number')
# Compute correlation matrix corr_matrix = num_df.corr()

# Display correlation matrix print(corr_matrix)

2. Heatmap Using sns.heatmap()

plt.figure(figsize=(10, 6)) sns.heatmap(corr_matrix, annot=True, fmt=".2f", cmap="coolwarm",

linewidths=0.5)

plt.title("Correlation Heatmap - Titanic Numeric Features") plt.show()

3. Pairplot (Again, But for Multivariate) sns.pairplot(df[['age', 'fare', 'pclass', 'survived']],

hue='survived', palette='Set1') plt.suptitle("Pairwise Plot of Age, Fare, Pclass, and Survival",

y=1.02) plt.show()
Q9. Time Series and Trend Analysis

• Plot time-based data using Pandas and Matplotlib.

• Perform trend analysis and plot rolling averages.

• Select a real dataset (e.g., COVID-19, IPL stats, sales data).

Answer:

import pandas as pd import matplotlib.pyplot as

plt import numpy as np

# Load dataset df = pd.read_csv("titanic.csv")

# Create a fake 'Date' column (spread over 100 days before April 15, 1912) df['Date'] =

pd.date_range(end="1912-04-15", periods=len(df))

# Sort by date df.sort_values('Date', inplace=True)

# Group by date and count passengers daily_passengers =

df.groupby('Date').size()

# Plotting daily passenger entries plt.figure(figsize=(12, 5)) daily_passengers.plot(kind='line',

title='Simulated Passenger Entries Over Time') plt.xlabel("Date") plt.ylabel("Number of

Passengers") plt.grid(True) plt.show()

B. Rolling Averages (Trend Smoothing)

# 7-day rolling average


rolling_avg = daily_passengers.rolling(window=7).mean()

plt.figure(figsize=(12, 5)) plt.plot(daily_passengers, label='Daily Count')

plt.plot(rolling_avg, label='7-Day Rolling Average', color='red') plt.title("Trend of

Simulated Passenger Entries (with Smoothing)") plt.xlabel("Date")

plt.ylabel("Passenger Count") plt.legend() plt.grid(True) plt.show()

You might also like