100% found this document useful (1 vote)
189 views67 pages

DM14 Visualisation

This document discusses various techniques for visualizing data, including graphs like bar charts, line charts, and scatter plots that are useful for showing trends, relationships, and comparing values, as well as strategies for choosing the appropriate visualization based on the type of data and insights sought. It also covers more advanced visualization methods for higher-dimensional data like parallel coordinates and scatterplot matrices.

Uploaded by

Ajaya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
189 views67 pages

DM14 Visualisation

This document discusses various techniques for visualizing data, including graphs like bar charts, line charts, and scatter plots that are useful for showing trends, relationships, and comparing values, as well as strategies for choosing the appropriate visualization based on the type of data and insights sought. It also covers more advanced visualization methods for higher-dimensional data like parallel coordinates and scatterplot matrices.

Uploaded by

Ajaya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Data Visualization

or Graphical Data Presentation

Jerzy Stefanowski
Instytut Informatyki

Data mining for SE -- 2013


Ack.

Inspirations are coming from:


•G.Piatetsky Schapiro lectures on KDD
•J.Han on Data Mining
•Ken Brodlie “Envisioning Information”
•Chris North “Information Visualisation”
What is visualization and data mining?

• Visualize: “To form a mental vision, image, or picture of


(something not visible or present to the sight, or of an
abstraction); to make visible to the mind or imagination.”
• Visualization is the use of computer graphics to create
visual images which aid in the understanding of complex,
often massive representations of data.
• Visual Data Mining is the process of discovering implicit
but useful knowledge from large data sets using
visualization techniques.
Tables vs graphs

A table is best when: A graph is best when:


• You need to look up • The message is
specific values contained in the shape of
• Users need precise the values
values • You want to reveal
• You need to precisely relationships among
compare related values multiple values
• You have multiple data (similarities and
sets with different units of differences)
measure • Show general trends
• You have large data sets
• Graphs and tables serve different purposes. Choose the
appropriate data display to fit your purpose.
Exploratory Data Analysis

• Pioneer -> John Tukey • Two stage process:


• New approach to data – Exploratory: Search for
analysis, heavily based on evidence using all tools
visualization, as an available
alternative to classical data – Confirmatory: evaluate
analysis strength of evidence using
• See its bio classical data analysis
Box Plots

• In some situations we have, not M – median


a single data value at a point, Q1, Q3 – quarrtiles
but a number of data values, or Whiskers –
1.5 * interquartile range
even a probability distribution Dots - outliers
• When might this occur?
• Tukey proposed the idea of a
boxplot to visualize the
distribution of values
• For explanation and some Darwin’s plant study
history, see:

http://mathworld.wolfram.com/Box-and-
WhiskerPlot.html
http://en.wikipedia.org/wiki/Box_plot

http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html
Distribution visualisation – US Crime Story
Data Visualization – Common Display Types

Common Display Types


– Bar Charts
– Line Charts
– Pie Charts
– Bubble Charts
– Stacked Charts
– Scatterplots
When to use which type?
20
15 Line Graph
10
5
0
– x-axis requires quantitative variable
1 2 3 4 5 6 7 8
– Variables have contiguous values
– Familiar/conventional ordering among
15
10 ordinals
5
0
1 2 3 4 5 6 7 8
Bar Graph
– Comparison of relative point values
100%
80%
R2 = 0.87

Scatter Plot
60%
40%
20%
0% – Convey overall impression of relationship
between two variables
0.0 0.2 0.4

Pie Chart
– Emphasizing differences in proportion
among a few numbers
Line Graph – Trend visualization

• Fundamental technique of
data presentation
• Used to compare two
variables
– X-axis is often the control Students participating in sporting activities
variable
– Y-axis is the response
variable
• Good at:
– Showing specific values Mobile
– Trends Phone use
– Trends in groups (using
multiple line graphs)

Note: graph labelling is fundamental


Time line graph – show dynamics of measurements
Stratified graphs

• Trends of values with respect to time and different qualitative


categories
Demo – Baby Names Voyager

http://www.babynamewizard.com/voyager
Scatter Plot – Wykresy rozrzutu XY

• Used to present
measurements of two
variables
• Effective if a relationship
exists between the two
variables

Car ownership by household income

Example taken from


NIST Handbook –
Evidence of strong
positive correlation
Simple Representations – Bar Graph

• Bar graph
– Presents categorical variables
– Height of bar indicates value
– Double bar graph allows
comparison
– Note spacing between bars
– Can be horizontal (when would
you use this?) Number of police officers

Internet use at a school


Note more space for labels
Dot Graph

• Very simple but effective…


• Horizontal to give more space
for labelling
Bad Visualization: Spreadsheet

Year Sales Sales


1999 2,110
2130
2000 2,105 2125
2120
2001 2,120
2115
2002 2,121 Sales
2110
2105
2003 2,124
2100
2095
1999 2000 2001 2002 2003

What is wrong with this graph?


19
Bad Visualization:
Spreadsheet with misleading Y –axis

Year Sales Sales


1999 2,110
2130
2000 2,105 2125
2120
2001 2,120
2115
2002 2,121 Sales
2110
2105
2003 2,124
2100
2095
1999 2000 2001 2002 2003

Y-Axis scale gives WRONG


impression of big change
20
Better Visualization

Year Sales Sales

1999 2,110 3000


2000 2,105 2500

2001 2,120 2000


1500 Sales
2002 2,121
1000
2003 2,124 500
0
1999 2000 2001 2002 2003

Axis from 0 to 2000 scale gives


correct impression of small change + small formatting tricks
21
Integrating various graphs
Pie Chart

• Pie chart summarises a set of


categorical/nominal data
• But use with care…

• … too many segments are


harder to compare than in a bar Should we have a long lecture?
chart

Favourite movie genres


Visualizing in 4+ Dimensions

• Extensions of Scatterplots
• Parallel Coordinates
• Radar Figures
• Other tools
• …
Multiple Views

Give each variable its own display

A B C D E
1 4 1 8 3 5 2
2 6 3 4 2 1
3 5 7 2 4 3
4 2 6 3 1 5 3

A B C D E
Problem: does not show correlations
Tableau bar comparisons
Buisness Analytics Tools – Manager Dashboards
Scatterplot Matrix

Represent each possible


pair of variables in their
own 2-D scatterplot
(car data)

Q: Useful for what?


A: linear correlations
(e.g. horsepower & weight)

Q: Misses what?
A: multivariate effects
Parallel Coordinates

• Encode variables along a horizontal row


• Vertical line specifies values

Same dataset in parallel coordinates


Dataset in a Cartesian coordinates
Invented by
Alfred Inselberg
while at IBM,
1985
Parallel Coordinates: 4 D

Sepal Sepal Petal Petal


Length Width length Width

3.5

5.1 0.2
1.4

sepal sepal petal petal


length width length width
5.1 3.5 1.4 0.2
Parallel Coordinates Plots for Iris Data
Radar Figures

• Agregate multidimensional
observations
• Each observation gets a
separate colour or graph
symbols
• Variables corresponds to
angles
Wybrana dziedzina

Wykres radarowy –
oceny wskaźników
w ramach dziedziny
I poziom oceny
F. Nightingale (1856) – abstract representation
Buisness Analytics Tools – Typical Reports

Raport more traditional Other forms


Buisness Analytics Tools – Manager Dashboards
Bars in business dashboards – Tableau Software
Data analytics – kokpity menadżerskie

• SAS Enterprise BI
Multidimensional Stacking
Multidimensional presentation of nominal attributes

• VL1 diagrams (Michalski 70) for machine learning

STAGGER and concept drif


Hierarchiczne wizualizacje - Treemaps

• Treemaps display hierarchical data using rectangles. Each branch of the tree
is assigned a rectangle. Then each sub-branch gets assigned to a rectangle
and this continues recursively until a leaf node is found.
• Depending on choice the rectangle representing the leaf node is colored,
sized or both according to chosen attributes.
Gapminder – Motion Charts

http://www.gapminder.org/ Using Bubble presentations


Spotfire
Chernoff Faces

Encode different variables’ values in characteristics


of human face

Cute applets: http://www.cs.uchicago.edu/~wiseman/chernoff/


http://hesketh.com/schampeo/projects/Faces/chernoff.html
Hierarchical Techniques

Cone Trees [RMC91]


• animated 3D
visualizations of
hierarchical data
• file system structure
visualized as a cone
tree

48
Abstract Î Hierarchical Information – Preview

Traditional Treemap Hyperbolic Tree

Botanical
ConeTree SunTree
Visualization of Search Results & Inter-Document Similarities
Abstract Î Text – MetaSearch Previews

Grokker Kartoo

MSN

Lycos AltaVista

MetaCrystal Î searchCrystal
Other buisness tools
Visualization of different conditions
Overview and Detail
Brushing and Linking
Census Data
Visualization of Association Rules in SGI/MineSet 3.0

57
IBM Miner – visualization of mining results
SGI – other tools
Graph-based Techniques

Narcissus
• Visualization of a large
number of web pages
• visualization of complex
highly interconnected
data

60
Visualization of knowledge discovery process

• A graphical tool for arranging components / steps of KDD


• Just a graph flow of actions
• Graphical objects – plug and place
• Parametrization
• Often → you may produce a kind of scipt representing a
graphical flow of KD process
Statsoft – Data mining graphical panel
RapidMiner (YALE)
Tukey’s recommendations
Tufte’s Principles of Graphical Excellence

• Give the viewer


– the greatest number of ideas
– in the shortest time
– with the least ink in the smallest space.

• Tell the truth about the data!

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)


Look for other references
And play with different software tools
Excel is not the only and best software
Thank you for you coming to my lecture and asking questions!

You might also like