Networks
COMP7507
Visualization & Visual Analytics
Network Size
• Size of the WWW: ~60 billion pages
(http://www.worldwidewebsize.com/ )
[ Opte project]
2
Network size
• Facebook: 2.23 billion monthly active users
• Twitter: 330 million monthly active users
• Weibo: 360 million monthly active users
• QQ: 861 million monthly active users
(Data from:http://expandedramblings.com/index.php/resource-how-many-people-use-the-top-social-media/
updated August 2018)
3
[Paul Butler, http://paulbutler.org/archives/visualizing-facebook-friends/ ]
Graph Drawing
• Direct calculation based on graph structure
– Spanning tree
– Adjacency matrix layout
• Optimization-based
– Optimizing the graph aesthetic constraints
– Force-directed layout
4
Spanning Tree Layout
• Many graphs have tree-like structure or useful
spanning trees (i.e., trees that include all
vertices but only some edges of the original
graph)
– WWW, Social Networks
Graph Minimum Spanning Tree
[wikipedia] 5
Spanning Tree Layout
• To extract a spanning tree from a graph and
visualize the tree (which is efficient)
• Drawing of a graph is in general non-
deterministic, and a spanning tree layout offers
predictability
• Spanning Tree can be obtained by
– Breadth-First Search (BFS) / Depth-First Search (DFS)
– Min/max spanning tree
6
Adjacency Matrices
Les Misérables Co-occurrence
[ http://bost.ocks.org/mike/miserables/ ]
7
Node-Link Layout
• Severe edge crossings and cluttering
8
Optimization Techniques
• Formulate the layout problem as an optimization
problem
• Governed by an equation incorporating
– the costs of items (e.g., distance between nodes) to
be optimized; and
– some constraints (e.g., no edge crossings allowed)
• Non-deterministic and therefore resulting layout
is unpredictable
• Force-Directed Layout is commonly used for
undirected graph
9
Force Directed Layout
[ http://philogb.github.io/jit/static/v20/Jit/Examples/ForceDirected/example1.html ]
10
Force Directed Layout
• To model nodes and edges of a graph as physical
bodies tied with springs
• Two principles:
– Vertices connected by an edge should be drawn near
each other.
– Vertices should not be drawn too close to each other.
• Edges = springs (attractive force)
– Hooke’s Law: F = k * x x: length of spring
• Nodes = charged particles (repulsive force)
– Coulomb’s Law: F = − k * q1 * q2 / x2
k: constant; x: distance between two particles; q1,q2: charges
11
Force Directed Layout
• Initialize node positions (e.g., random placement)
• Do this iteratively until stabilized:
1. Calculate forces for nodes
2. Update node positions
• Different physical models lead to layouts of varying
quality 12
Fruchterman-Reingold Algorithm
• A force directed layout algorithm
• Calculate an optimal distance between nodes,
δ = C * sqrt( area / |V| ),
where C is a constant
• Attractive force
fa(x) = d2 / δ
• Repulsive force
fr(x) = − δ2 / d
d: distance between two nodes
Fruchterman, T. M. J., & Reingold, E. M. (1991). Graph Drawing by Force-
Directed Placement. Software: Practice and Experience, 21(11). 13
Filtering
• To reduce the number of visible elements in a
graph being viewed
– Improves clarity
– Increases performance of layout and rendering
• Removing some “redundant” edges to better
reveal network topology, and to facilitate the
clustering process
14
Filtering
• E.g., Edges linking you and your friends in your
Facebook network
• Ego network
[Hansen 2011] 15
Filtering
Results of Fruchterman-Reingold layout on
the 2007 US Senate voting data with and without edge filtering.
Without filtering Edges corresponding to
% agreement < 0.65 are filtered
Node color represents party affiliation of a senator
16
Clustering
• Structure-based clustering
– Use only structural information (e.g., edge
connections) of a graph
• Content-based clustering
– Use semantic data associated with graph elements
– Application specific
17
Clustering
• It is natural to choose the clustering with the least
number of edges between members
– or with the minimum total weight of the edges
connecting members for graphs with weighted edges
• Force directed layout algorithms
can also form clusters naturally
18
Edge Bundling
• Clustering of edges instead of nodes
• To reduce cluttering
• Examples:
– Geometry-Based Edge Clustering Structure-based
[Cui et al., "Geometry-Based Edge Clustering for Graph Visualization," TVCG, 2008]
– Hierarchical Edge Bundles
19
Hierarchical Edge Bundles
• Some data sets are with both non-hierarchical
(adjacency) and hierarchical relations.
– Source codes for a software:
• Adjacency: dependencies of classes
• Hierarchical: directories -> files -> classes
– Social networks:
• Adjacency : if people are acquainted and nature of relations
• Hierarchical: circles or groups of people -> individuals
– Citation networks:
• Adjacency: citations among publications
• Hierarchical: institutions -> departments -> publications
20
Hierarchical Edge Bundles
• Visualization examples without edge bundling
Colored edges representing adjacency relations on (left) balloon trees, and (right) tree maps.
[Holten, "Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data," TVCG , 2006]
21
Hierarchical Edge Bundles
• Bundle adjacency edges along tree hierarchy
1. Tree drawing for hierarchical relations
2. Add adjacency edge (P0, P4) to graph
Find path that goes through least common ancestor (LCA) Draw curved adjacency edge that ”aligns" to tree
22
Hierarchical Edge Bundles
• Examples with hierarchical edge bundling
23
Hierarchical Edge Bundles
Tree structure is used only for
[http://mbostock.github.io/d3/talk/20111116/bundle.html] edge clustering and is not shown
24
Understanding a Network
• Visualization alone cannot provide full
understanding of a graph or network
• Network graph metrics are quantitative
measures for describing a network,
characterizing a subgroup or some specific nodes
within a network
– Influential people in a social network (e.g., celebrities
in Twitter or Weibo)
– Gatekeepers connecting communities (e.g., for
headhunting in LinkedIn)
25
Understanding a Network
• Studying network metrics over time helps
understand how a network evolves
• Network metrics help improve the visual display
of a network
– By assigning different visual attributes to the nodes
– By filtering and showing only the important nodes
26
Network Metrics
• Overall graph metrics
– Graph type; # of vertices; # of edges
– Self loops (e.g., a person replying his own emails)
– Connected components
– Isolated vertices
– Maximum geodesic distance (aka diameter)
• i.e., the distance between two nodes that are farthest apart
– Average geodesic distance
• i.e., average distance from one node to another through the
graph edges
– Graph density
• i.e., # edges ÷ # of possible edges
– etc.
For undirected graph, graph density = (2 * |E|) / (|V| * (|V|-1))
For directed graph, graph density = |E| / (|V| * (|V|-1))
27
Overall Graph Metrics
• Max. geodesic dist.
= 4
• Avg. geodesic dist.
= 1.98
• Graph density
= 18/45 = 0.4
28
Network Metrics
• Node metrics: structure-based metrics associated with a
node
– Degree (in-degree, out-degree)
– Betweenness centrality
– Closeness centrality
– Clustering coefficient
– Eigenvector centrality
– PageRank
• Use for identifying special or important nodes or
subgroups
• There are edge metrics as well (e.g., edge betweenness)
29
Degree Centrality
• A node is considered more important if it is of a
greater degree
• Centrality means
“Importance”
30
Betweenness Centrality
• Measures the importance of a person in passing
information within a network
Who is more important in terms
of betweenness, Ed or Heather?
Intuitively, Heather is of higher
betweenness centrality than Ed
because removing Heather results in
isolated nodes
31
Betweenness Centrality
• The betweenness centrality of a node:
# of shortest path
between s and t passing through v
Total # of shortest path
between s and t
• Equivalent to computing the all-pairs shortest
path of a graph ― Complexity: O(|V|3)
O(|V||E|) on unweighted sparse graph
Ulrik Brandes, “A Faster Algorithm for Betweenness Centrality ”,
Journal of Mathematical Sociology, 2001.
32
Betweenness Centrality
Vertex size representing betweenness
33
Closeness Centrality
• Measures how close a person is to the others
– how fast a message can reach all others from a
person
• What is the scenario for a person’s message to
reach all others in the fastest way?
• The closeness centrality of a node:
Farness of v:
Total distance between
v and all other nodes
34
Closeness Centrality
Fernando and Garth are of
the highest closeness measures
Vertex size representing closeness
35
Clustering Coefficient
• Measures how well a person’s friends are
connected to each other
Ed’s three friends know
each other
36
Clustering Coefficient
• Clustering coefficient of a node:
# edges connecting u’s neighbours
C(u) =
total # possible edges connecting u’s neighbours
If u has k neighbours, this equals # of edges
in a k-vertex clique (complete graph)
= k (k – 1) / 2 for an undirected graph
= k (k – 1) for a directed graph
• Ranged in [0,1]
– C(u) = 0: no edges among neighbours
– C(u) = 1: neighbours form a clique
37
Clustering Coefficient
Vertex size representing
clustering coefficient
38
Eigenvector Centrality
• Measures the influence of a node in a network
– having a connection to an important person is more
important
• deg(Healther)
= deg(Ed) = 3
• Ed connects with Diane
who is the most popular
(i.e., having the largest
degree)
• Heather connects to Ike,
who is among the least
popular
• Hence, Ed’s eigenvector
centrality is higher
39
Eigenvector Centrality
• The eigenvector centrality score of a node is:
: constant = 1 iff u, v are neighbours,
Sum of scores over all u's neighbours otherwise =0
• Written in matrix form gives:
: adjacency matrix
Since it is required that all scores (i.e., entries of x) be positive,
the eigenvector corresponding to the largest eigenvalue will give us the scores.
40
Eigenvector Centrality
Vertex size representing
eigenvector centrality
41
PageRank
• Used by Google search engine to rank websites
in their search results.
• Idea: More important websites are likely to
receive more links from other websites.
• A variant of eigenvector centrality
• Score of a page = the probability of being
brought to a page after many clicks.
42
PageRank
• The percentage shows
the likelihood that a
page can be reached
after many clicks.
• Assumption: A user
start on a random page,
and has 85% chance of
clicking randomly on
any link in the page,
and 15% chance of
http://en.wikipedia.org/wiki/PageRank jumping randomly to
any other page in the
WWW. 43
PageRank
• An assumption that there is possibility a surfer
will stop following links and jump instead to a
random page
: probability of following a link : total number of nodes
: probability of jumping to a random page : out degree of
• Can be used for ranking nodes in a general
network.
44
Visual Attributes Mapping
• Map ordered visual attributes to ordered data
and unordered visual attributes to categorical
data
• Ordered data: Node degree, centrality
– Ordered visual attributes: Size, line width, opacity
• Categorical data: Gender, affiliation
– Unordered visual attributes: Color, shape
45
Visual Attributes Mapping
[Hansen 2011]
shape gender size betweenness
color cluster opacity eigenvector centrality
46
Tools and Datasets
• Tools for Graph Visualization
– Gephi
https://gephi.org
– NodeXL
http://nodexl.codeplex.com
• Large Network Datasets
– A Citation Network Dataset
http://arnetminer.org/citation
– Stanford Network Analysis Project
http://snap.stanford.edu
47
Reference
• Ivan Herman, Guy Melançon, M. Scott Marshall, “Graph
Visualization and Navigation in Information
Visualization: A Survey”, IEEE Trans. Vis. Comput. Graph,
6 (1), 2000, pp. 24-43.
• Matthew Ward, Georges Grinstein and Daniel Keim,
"Interactive Data Visualization: Foundations, Techniques,
and Applications", 2010 [Chapter 8]
• Hansen, Shneiderman and Smith, “Analyzing Social
Media Networks with NodeXL: Insights from a
Connected World”, 2011.
• Isabel F. Cruz and Roberto Tamassia, “Graph Drawing
Tutorial” (http://cs.brown.edu/~rt/papers/gd-tutorial/gd-
constraints.pdf)
48