Data Visualization Tableau
Data Visualization Tableau
Data visualization is the graphical representation of information and data. By using visual elements
like charts, graphs, and maps, data visualization tools help transform complex data sets into
accessible and understandable insights. It allows viewers to quickly identify patterns, trends,
correlations, and anomalies in the data, making it easier to interpret, analyze, and communicate
information.
• Charts and Graphs: Such as bar charts, line charts, and pie charts that display individual
metrics.
• Heatmaps: Show data intensity through colors, useful for visualizing correlations and patterns.
• Maps: Geospatial data can be represented on maps to see trends based on geography.
• Infographics: Combine visuals and text to provide a comprehensive, easy-to-understand
presentation of the data.
• Dashboards: Interactive visual interfaces that display multiple data visualizations together, often
in real time.
1. Matplotlib (Python)
• Purpose: One of the most popular Python libraries for creating static, animated, and interactive
visualizations.
• Key Features: Line plots, bar charts, histograms, scatter plots, pie charts, etc.
• Use Case: Widely used for generating quick and customizable plots in Python.
Example usage:
Example usage:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species")
fig.show()
4. Tableau
• Purpose: A leading data visualization tool used to create interactive and shareable dashboards.
• Key Features: Drag-and-drop interface, real-time data integration, interactive visualizations.
• Use Case: Commonly used for business intelligence, reporting, and analytics with an intuitive
GUI.
• Features:
• Create visualizations using simple drag-and-drop actions.
• Can connect to various data sources (e.g., SQL, Excel, Google Analytics).
5. Power BI
• Purpose: A business analytics tool by Microsoft that enables users to visualize and share insights
from their data.
• Key Features: Real-time dashboards, data modeling, interactive visualizations, integration with
Microsoft Excel and SQL Server.
• Use Case: Used for business analytics and decision-making in corporate environments.
6. ggplot2 (R)
• Purpose: A powerful data visualization library in R that uses a grammar of graphics to create
complex plots.
• Key Features: Layered approach to creating plots, customization of axis labels, scales, and
themes.
• Use Case: Particularly useful in the R ecosystem for statistical data visualization.
Example usage:
library(ggplot2)
data(mpg)
ggplot(mpg, aes(x=displ, y=hwy)) +
geom_point() +
labs(title="Displacement vs Highway MPG")
7. Excel
• Purpose: A widely used spreadsheet tool with basic graphing capabilities.
• Key Features: Line charts, bar charts, scatter plots, pie charts, etc.
• Use Case: Common for creating quick charts and visualizations when working with small to
medium datasets.
Example Usage:
2000
5000
3000
These tools are essential for making data easily interpretable, whether for personal use, presentations,
or business reporting. Each tool has its strengths, and the choice of which to use depends on the
complexity of your data and the interactivity you need.
Practical-2
➢ Aim: What is data, where to find data, foundations for building data
visualizations, creating your first visualization.
Description:
✓ What is Data?
Data refers to raw facts, figures, or information that can be collected, analyzed, and interpreted. It
can take many forms:
• Numerical: Such as sales numbers, temperature readings, or prices.
• Categorical: Such as names, regions, or product types.
• Textual: Such as customer reviews, social media posts, or email contents.
• Temporal: Data collected over time, like daily stock prices, monthly website visits, or yearly
income growth.
In essence, data is the foundation for making decisions, discovering insights, and creating actionable
information that can help solve problems or improve processes.
Finding good-quality data is key to creating effective visualizations. Here are some places where we
can find useful datasets:
1. Public Data Repositories
o Kaggle: Offers free datasets for machine learning, data science, and visualization
practice. It’s a great resource for beginners and professionals alike.
o UCI Machine Learning Repository: A collection of datasets widely used in academic
and research settings.
o Google Dataset Search: A search engine for datasets across the web.
o Data.gov: The U.S. government's open data portal, offering datasets on various topics
like health, education, and finance.
o World Bank Open Data: Contains global data on a variety of development indicators,
economic factors, and more.
Before jumping into creating a data visualization, it's important to understand some basic principles
that guide effective visual communication.
6. Data Integrity
o Always ensure that the data is accurate and appropriately cleaned before visualizing it.
Incorrect or outlier values can lead to misleading visualizations.
Matplotlib is a great library for creating static visualizations. Let’s create a simple line chart to
visualize sales data using Python.
Step-by-Step Guide to Create a Visualization in Matplotlib:
1) Install Matplotlib: If we don’t already have Matplotlib installed, we can install it using pip:
pip install matplotlib
2) Prepare Your Data: Let’s create a simple dataset using Python. For this example, we’ll use
Month and Sales.
3) Create a Line Chart: Now, let’s create a basic line chart to visualize the sales over time.
4) Customize the Chart: You can enhance the chart by adding titles, axis labels, and a legend.
5) Output: Running the above code will display a line chart with the months on the x-axis and sales
on the y-axis. The chart will also include a title, axis labels, and a legend.
Source Code:
Output:
Practical-3
➢ Aim: Getting started with Tableau Software using data file formats, connecting
your data to Tableau, creating basic charts (line, bar charts, Tree maps) using the
show me panel.
Description:
✓ Introduction to Tableau
Tableau is a leading data visualization tool that helps users transform raw data into insightful,
interactive visualizations and dashboards. Known for its intuitive drag-and-drop interface, Tableau
makes it easy for anyone—from beginners to experienced data analysts—to create beautiful and
informative charts, graphs, and maps without needing extensive technical skills.
The main strength of Tableau lies in its ability to connect to various data sources, from simple
spreadsheets to complex databases, and instantly create dynamic visualizations. Whether you’re
working with sales figures, financial data, or customer demographics, Tableau enables you to analyze
trends, identify patterns, and present findings in a visually compelling way.
• Wide Range of Data Connections: Tableau can connect to various data sources, including
Excel, CSV, SQL databases, Google Sheets, cloud platforms, and more. This makes it ideal for
consolidating data from multiple sources into a single, unified dashboard.
• Interactive Dashboards: With Tableau, you can create dashboards that allow users to interact
with data in real time. Filters, tooltips, and parameter controls provide an engaging experience,
allowing users to explore data on their own.
• Scalability and Collaboration: Tableau’s suite of products, such as Tableau Desktop, Tableau
Server, and Tableau Online, allow teams and organizations to share, collaborate, and scale data
insights across multiple departments.
Tableau can connect to a variety of data sources, both file-based and server-based. Some common
file formats you can use include:
• Excel (.xlsx, .xls)
• Comma Separated Values (CSV) (.csv)
• Text files (.txt)
• JSON (.json)
• PDF (.pdf)
• Tableau Data Extract (.hyper, .tde)
• Spatial Files (like .shp for geographical data)
• Statistical Files (.sav, .sas7bdat)
• Google Sheets
Tableau also supports connecting directly to databases like MySQL, Microsoft SQL Server, Oracle,
and cloud-based sources like Google BigQuery, AWS, and Microsoft Azure.
1. Open Tableau:
o When we open Tableau, we’ll see the Connect pane on the left. This pane allows us to
connect to different data sources.
2. Choose the Data Source:
o Click on the file type or database we want to connect to. For example, if we’re using an
Excel file, click on "Microsoft Excel."
3. Locate the File:
o After selecting our data type, browse our computer to find the file we want to use.
Tableau will load it and display a preview.
4. Data Source Setup:
o Once our data file is loaded, we’ll be taken to the Data Source tab.
o Here, we can drag tables or sheets (if using Excel) into the workspace, specify joins, and
view a sample of the data.
o we can also rename fields, change data types, and set relationships between tables if our
dataset has multiple sheets or tables.
5. Switch to the Worksheet:
o Once our data is connected and ready, click on the Sheet tab at the bottom to start creating
visualizations. This will take us to the main workspace in Tableau.
✓ Creating Basic Charts Using the Show Me Panel
The Show Me panel in Tableau is a quick tool for generating various types of visualizations based on
the data selected.
A. Creating a Line Chart
1. Select Data for the Line Chart:
o In our Data pane on the left, find the fields we want to plot. For example, if we have a
dataset with Country and boxes shipped, drag the country field to the Columns shelf and
boxes shipped to the Rows shelf.
2. Open Show Me:
o Open the Show Me panel by clicking the "Show Me" button in the upper-right corner.
3. Choose Line Chart:
o With our data selected, click the Line Chart option in the Show Me panel. Tableau will
automatically generate a line chart showing how many boxes are shipped in a country.
4. Customize:
o We can further customize our chart by changing colors, adding labels, or adjusting the
date aggregation (e.g., monthly, quarterly).
Description:
Creating visualizations in Tableau allows users to transform raw data into clear, interactive insights.
In this example, a bar chart will be used to display various book categories alongside the number of
recommendations each one has received.
Bar charts are ideal for comparing quantities across categories, making it easy to see which genres
are most popular or highly recommended.
By following a few simple steps in Tableau, users can connect their data, select appropriate fields,
and generate a visual that highlights key insights. This bar chart will provide a straightforward
comparison of recommendation counts across book categories, enabling viewers to quickly identify
trends and preferences.
Steps to Create a Bar Chart for Book Categories and Recommendations in Tableau:
5. Saving or Exporting
After completing the bar chart:
• The user can save their work in Tableau by going to File > Save As.
• Alternatively, they can export the chart as an image or PDF by selecting File > Export >
Image/PDF.
Output:
Practical-5
➢ Aim: Tableau calculations, overview of SUM, AVG, and Aggregate features,
creating custom calculations and field.
Description:
✓ Aggregate Functions:
Aggregate functions perform calculations on multiple values to produce a single summarized result.
They’re commonly used to summarize data, such as by calculating totals, averages, counts, or
finding the highest and lowest values. Some of the functions include:
1. SUM
• Purpose: This function adds up all values within a selected field.
• Use Case: Summing sales, profits, or any other metric to get a total value.
• Example: SUM(Sales) will calculate the total sales amount for all data points in the selected
scope.
• Result: A single value that represents the total.
2. AVG (Average)
• Purpose: This function calculates the average (mean) of values within a selected field.
• Use Case: Finding the average order value, average temperature, or any other metric where
an average insight is useful.
• Example: AVG(Sales) calculates the average sales amount by dividing the total sales by the
count of data points.
• Result: A single value that represents the average across all data points in the scope.
3. COUNT
• Purpose: This function counts the number of records or instances of a specific field.
• Use Case: Counting the number of customers, orders, or products in a dataset.
• Example: COUNT(Customer ID) counts the number of unique customers based on their IDs.
• Result: A single value representing the total count.
• COUNTD: Tableau also offers COUNTD, which counts distinct (unique) values in a field.
COUNTD(Product) would count each unique product only once.
4. MAX (Maximum)
• Purpose: Finds the highest (maximum) value within a selected field.
• Use Case: Identifying the maximum order amount, highest temperature, or maximum sales in
a category.
• Example: MAX(Sales) will return the highest individual sales amount from the data.
• Result: A single value representing the maximum.
5. MIN (Minimum)
• Purpose: Finds the lowest (minimum) value within a selected field.
• Use Case: Identifying the minimum price, lowest order amount, or smallest metric value in a
dataset.
• Example: MIN(Sales) will return the smallest individual sales amount.
• Result: A single value representing the minimum.
The new Cost field will now appear in the Data pane, and we can use it in our visualizations
as shown below:
✓ Quick Table Calculations
In addition to custom-calculated fields, Tableau offers quick table calculations, which can be applied
by right-clicking on any measure in the view and selecting Quick Table Calculation. These include
options like:
• Running Total
• Percent Difference
• Moving Average
• Rank
• Percent of Total
Quick table calculations are easy to use and ideal for rapid insights without the need for complex
custom fields.
Practical-6
➢ Aim: Connecting to Data and preparing data for visualization in Tableau.
Description:
Once the data is loaded, it’s important to prepare it for analysis and visualization:
• Inspect the Data Structure: Verify the imported data. Tableau will display tables and fields;
check if they are structured as expected.
• Join or Union Data: If working with multiple tables, you can join or union them to combine
data. Drag tables into the canvas and specify join conditions if required.
• Rename Fields and Adjust Data Types:
o Rename fields by double-clicking on the field name.
o Ensure each field has the correct data type (e.g., numeric, text, date) by checking the icon
next to it. Click the icon to change the type if necessary.
• Clean Data:
o Remove or hide unnecessary fields to keep your workspace organized.
o You can create calculated fields for more detailed insights (e.g., calculating "Cost" by
subtracting profit from sales).
Practical-7
➢ Aim: Editing and formatting axes, manipulating data in Tableau, pivoting
Tableau data.
Description:
In Tableau, editing and formatting axes, manipulating data, and pivoting data are key tasks for
creating polished, meaningful visualizations.
2. Filtering Data:
o Drag fields to the Filters shelf to filter out specific data.
o We can filter by categorical values, range of dates, or numeric range.
3. Grouping Data: Tableau allows grouping values that are similar into a single group.
o Right-click on a dimension and select Group to create custom groups.
5. Show Totals and Subtotals: You can enable Grand Totals or Subtotals to show aggregated
values at different levels in your data.
R is a powerful, open-source programming language and software environment primarily used for
statistical computing, data analysis, and data visualization. Developed by statisticians and widely
adopted by data scientists, analysts, and researchers, R has become one of the most popular
languages for data science.
✓ Key Features of R
• Statistical and Mathematical Functions: R offers built-in tools for statistical analysis, such as
hypothesis testing, regression, and time-series analysis, ideal for data-intensive research.
• Data Visualization: With base plotting functions and ggplot2, R allows for clean, customizable
visualizations.
• Extensive Package Ecosystem: CRAN provides thousands of packages for tasks like machine
learning, data wrangling, and bioinformatics.
• Data Manipulation: Packages like dplyr and tidyr streamline data cleaning and preparation.
• Reproducibility: R integrates with markdown tools, enabling reproducible reports that combine
code, results, and narrative.
✓ Basic Visualizations in R
A. Bar Charts
Purpose: Display categorical data as bars, with the height representing values.
Function used in Base R: barplot(height, names.arg)
Example:
library(ggplot2)
Example:
library(ggplot2)
time_data <- data.frame(Time = 1:5, Sales = c(20, 15, 30, 25, 35))
Example:
library(ggplot2)
# Example data
data <- data.frame(Height = c(5.5, 6.0, 5.8, 5.9, 6.1),
Weight = c(150, 160, 155, 165, 170))
Creating a basic dashboard in Tableau allows us to combine multiple visualizations into a single
view, making it easy to analyze and present key insights. Here’s a step-by-step guide for creating a
basic Tableau dashboard:
Output:
Practical-10
➢ Aim: Data Aggregation and Statistical functions in Tableau.
Description:
In Tableau, data aggregation and statistical functions are essential tools for summarizing and
analyzing data. These functions allow users to calculate totals, averages, medians, percentiles, and
other descriptive statistics that help make sense of large datasets.
Data aggregation is the process of summarizing detailed data into more general, interpretable
information. Tableau offers several built-in aggregation functions:
• SUM: Adds up all the values in a field. Commonly used for total sales, revenue, etc.
• AVG: Calculates the average of values, useful for finding mean values like average sales or
profit per item.
• COUNT: Counts the number of values or records in a field. COUNTD (count distinct) counts
unique entries only.
• MIN: Finds the minimum value in a field, often used to locate the smallest quantity or earliest
date.
• MAX: Finds the maximum value, useful for identifying the highest quantity, price, or latest date.
• MEDIAN: Computes the middle value, which is less affected by outliers than the average.
• PERCENTILE: Calculates percentile rankings, useful in identifying performance thresholds,
like the 90th percentile of scores.
1. Drag a measure (e.g. Boxes Shipped) onto the Rows or Columns shelf in the worksheet.
2. Tableau automatically aggregates the measure, usually as a SUM by default.
3. To change the aggregation, click on the field in the view, then select Measure and choose another
aggregation type, such as AVG, COUNT, MIN, or MAX.
✓ Statistical Functions in Tableau
In addition to basic aggregation, Tableau provides statistical functions that allow for deeper data
analysis. Some commonly used statistical functions are:
• STDEV and STDEVP: These calculate the standard deviation for a sample or population,
respectively, which shows how spread-out data points are from the mean.
• VAR and VARP: These compute the variance for a sample or population, indicating the degree
of dispersion.
• Z-Score: Standardizes data by showing how far a data point is from the mean in terms of
standard deviations.
• CORR: Calculates the correlation between two fields, measuring the strength of their
relationship.
• COVAR: Calculates covariance, showing the direction of the linear relationship between two
variables.
• RANK and RANK_DENSE: Ranks values in a field, useful for ranking products, regions, or
employees based on sales or performance metrics.
1. Create Calculated Fields: To use statistical functions, we can create calculated fields that apply
them to your data.
o Go to the Data pane, right-click, and select Create Calculated Field.
o Enter a formula using the desired statistical function, for example: STDEV([Sales]) or
CORR([Sales], [Profit]).
2. Quick Table Calculations: Some functions, like Moving Average or Percent of Total, can be
accessed quickly by right-clicking on a measure in the view and selecting Quick Table
Calculation.
STDEV Usage Example: