TM3 ch05 Link Analysis
TM3 ch05 Link Analysis
TM3 ch05 Link Analysis
Link Analysis
(Chapter 5 of MMDS book)
DAMA60 Group
School of Science and Technology
Hellenic Open University
Content
• Preliminaries
• Matrices, Vectors
• Linear Algebra
• Graphs
2
Matrices
3
Matrices
𝑎11 0
𝐴=
0 𝑎22
1 0
𝐴=
0 1
4
Vectors
• A vector v is a one-dimensional array of numbers (is an
𝑛 × 1 matrix – column vector)
• Example:
3
𝒗= 5
7
• The standard form of a vector is a column vector
Τ
3
𝑻
𝒗 = 5 = (3 5 7)
7
5
Operation on matrices
6
Examples
1 4
1 2 3 1∗1+2∗2+3∗3 1∗4+2∗5+3∗6 14 32
2 5 = =
4 5 6 4∗1+5∗2+6∗3 4∗4+5∗5+6∗6 32 77
3 6
1 1 1 −1 1 0
=
0 1 0 1 0 1
7
Graphs: Introduction
• ‘Everything is connected: Graph Neural Networks’ a
recent publication by Petar Veličković (Research Scientist
at Google DeepMind) outlines the importance of graph
structures.
• In many ways, graphs are the main modality of data we
receive from nature.
• Social networks are also of primal importance in todays’
applications:
• Who talks to whom?
• Who leads to what?
• Which pages do we like?
• What items do we order?
8
Graphs: Notation
• A graph (or network) G is a set of vertices 𝑉 = 1,2, … , 𝑛 joined
by edges(or links).
9
Node Degree in Undirected graphs
• 𝑑𝑖 is the degree (# of adjacent nodes or # of neighbors) for a target
vertex 𝑣𝑖.
• In the above graph, the degree for vertex 𝑣1 is 𝑑1 = 8 and for all
others is 𝑑𝑖 = 1, 𝑖 ≠ 1.
• The above graph is called star graph 𝑆𝑘 or k-star, with 𝑘 = 9.
• In general, those graphs are trees with k nodes, where one node
has degree equal to k-1, and the other k-1 nodes have degree
equal to 1.
10
Directed Edges and Directed Graphs
• Edges can have directions. A directed edge is sometimes
called an arc.
In- Out-
Node Degree Degree
A 1 3
B
C
D
E
F
G
12
Example for Node Degree in Directed Graphs
13
Example for Node Degree in Directed Graphs
14
Adjacency Matrix
Given a graph G with n nodes numbered from 1 to n, the adjacency
matrix 𝐴 = 𝑎𝑖𝑗 of G is a square 𝑛𝑥𝑛 matrix such that 𝑎𝑖𝑗 = 1, if
there exists an edge joining nodes i and j, and 𝑎𝑖𝑗 = 0, otherwise.
Here is an example:
A 1 2 3 4 5
1 0 1 1 1 1
2 1 0 1 0 1
3 1 1 0 0 1
4 1 0 0 0 0
5 1 1 1 0 0
16
Intuition of PageRank
• Web pages are important if people visit them a lot. But we can’t
watch everybody using the Web.
17
Ranking web pages
18
Simple recursive formulation
• Each link’s vote is proportional to the importance of its
source page.
• If page P with importance x has n out links, each link gets
x/n votes.
333 $ 333 $
333 $
1000 $
19
y, a, m represent the
Simple “flow” model importance of the 3 web
𝑦 pages, respectively.
• Suppose 𝒓 is a vector with one entry per web page. We can call this
as rank vector, for which holds that:
• 𝒓𝒊 is the importance score of page 𝑖
• σ𝒊 𝒓𝒊 = 1
21
Transition Probability Matrix
When M is a row stochastic matrix, it holds that:
If 𝑖 links to 𝑗 then
𝑀𝑖𝑗 = 1/𝑛
else
then 𝑀𝑖𝑗 = 0
22
Flow equations:
𝑦 𝑎
𝑦= +
Example 𝑦
2 2
𝑎= + 𝑚
𝑦 2
𝑎
2 𝑚=
2
Incoming weights
Outgoing weights
Yahoo
y 1/2
𝑦 1/2
0
2
Amazon Microsoft
a m
23
Flow equations:
𝑦 𝑎
𝑦= +
Example 𝑦
2 2
𝑎= + 𝑚
2
𝑎
𝑚=
2
Incoming weights
Outgoing weights
Yahoo
y 1/2
𝑎
0
2 1/2
Amazon Microsoft
𝑎
a 2 m
24
Flow equations:
𝑦 𝑎
𝑦= +
Example 𝑦
2 2
𝑎= + 𝑚
2
𝑎
𝑚=
2
Incoming weights
Outgoing weights
Yahoo
y 0
1
0
m
Amazon Microsoft
a m
25
Flow equations:
𝑦 𝑎
𝑦= +
Example 𝑦
2 2
𝑎= + 𝑚
𝑦 2
𝑎
2 𝑚=
2
Yahoo y a m
y ½ ½ 0
𝑎
y
a ½ 0 1 =M
𝑦 m 0 ½ 0
2
2
m
Amazon Microsoft
𝑎
a 2 m
26
Flow equations:
Example (row-stochastic) 𝑦 𝑎
𝑦= +
2 2
𝑦
𝑎 = +𝑚
2
𝑦 𝑎
𝑚=
Yahoo 2 2
y
y a m
𝑎 𝑦 y ½ ½ 0
2 2 a ½ 0 ½ =M
m 0 1 0
m
Amazon Microsoft
𝑎
a m
2
27
Solving the flow equations
28
Solving the flow equations
′
𝑦 𝑎
1/2 1/2 0 𝑦 = +
𝑦′ 𝑦 2 2
𝑎′ = 1/2 0 1 𝑎 ⇒ 𝑦
𝑎′ = + 𝑚
𝑚′ 0 1/2 0 𝑚 2
′
𝑚 = 𝑎/2
• Given y=2, a=2, and m=1, we find that y’=2, a’=2, and m=1.
• The constraint 𝑦 ′ + 𝑎′ + 𝑚′ = 1 help us to normalize the
network importance at each step (dividing by the sum of y’,
a’, m’).
29
Random Walks on the Web
30
Simulating a Random Walk
31
Power Iteration Solution
Yahoo y a m
y ½ ½ 0
a ½ 0 1 =M
m 0 ½ 0
Yahoo
Amazon M’soft
Yahoo
Amazon M’soft
Yahoo
Amazon M’soft
•35
y a m
The Surfers y ½ ½ 0
a ½ 0 1
m 0 ½ 0
Yahoo
Amazon M’soft
36
After convergence Y and A have equal relative
importance…
Yahoo
Amazon M’soft
Spider Traps
Taxation Policies
38
Dead Ends
• Some pages are dead ends (have no links out).
• Such a page causes importance to leak out (drain out),
or surfers to disappear.
39
Microsoft Becomes Dead End
Yahoo y a m
y ½ ½ 0
a ½ 0 0 =M
m 0 ½ 0
Amazon Microsoft
40
Example: Effect of Dead Ends
• Flow Equations v = Mv :
y a m
y = y /2 + a /2 y ½ ½ 0
a = y /2 a ½ 0 0 =M
m = a /2 m 0 ½ 0
41
Microsoft Becomes a Dead End
Yahoo
y a m
y ½ ½ 0
a ½ 0 0 =M
m 0 ½ 0
Amazon Microsoft
importance y 1
of the a = 1
3 nodes m 1
42
Microsoft Becomes a Dead End
Yahoo
y a m
y ½ ½ 0
a ½ 0 0 =M
m 0 ½ 0
Amazon Microsoft
importance y 1 1
of the a = 1 1/2
3 nodes m 1 1/2
43
Microsoft Becomes a Dead End
Yahoo
y a m
y ½ ½ 0
a ½ 0 0 =M
m 0 ½ 0
Amazon Microsoft
importance y 1 1 3/4
of the a = 1 1/2 1/2
3 nodes m 1 1/2 1/4
44
After convergence all relative
importance of nodes is lost…
Yahoo
y a m
y ½ ½ 0
a ½ 0 0 =M
m 0 ½ 0
Amazon Microsoft
2) Teleportation
• Follow random teleport links with probability 1.0 from dead-ends
• Adjust matrix accordingly
46
Spider Trap
47
Example: Effect of Spider Trap
Microsoft Becomes Spider Trap
• Flow Equations v = M v : y a m
y = y /2 + a /2 y ½ ½ 0
a = y /2 a
m
½
0
0
½
0
1
=M
m = a /2 + m
48
Microsoft Becomes a Spider Trap
Yahoo y a m
y ½ ½ 0
a
m
½
0
0
½
0
1
=M
Amazon M’soft
importance y 1
of the a = 1
3 nodes m 1
49
Microsoft Becomes a Spider Trap
Yahoo
y a m
y ½ ½ 0
a
m
½
0
0
½
0
1
=M
Amazon M’soft
importance y 1 1
of the a = 1 1/2
3 nodes m 1 3/2
50
Microsoft Becomes a Spider Trap
Yahoo
y a m
y ½ ½ 0
a
m
½
0
0
½
0
1
=M
Amazon M’soft
51
After convergence all importance
goes in one node…
Yahoo
y a m
y ½ ½ 0
a
m
½
0
0
½
0
1
=M
Amazon M’soft
𝑦 1 1 3/4 5/8 0
importance of the
𝑎 =1 1/2 1/2 3/8 … 0
3 nodes
𝑚 1 3/2 7/4 2 3
52
Dealing with spider traps
Teleportation
53
Vector formulation
• Suppose there are N pages
• Consider a page 𝑖 , with set of out links 𝑂 𝑖
• We have
1
• 𝑀𝑗𝑖 = 𝑂 𝑖
when 𝑖 links 𝑗, 𝑀𝑗𝑖 = 0 otherwise
54
Some Problems with Page Rank
55
Topic-Sensitive PageRank
𝟏−𝜷
𝜷∗𝑴∗𝒗+ , 𝑖𝑓 𝑖 ∈ 𝑆
• 𝑣TSPR ′ = ቐ 𝑺
𝜷 ∗ 𝑴 ∗ 𝒗, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
58
Finding newspapers
NYT: 10
• Hubs and Authorities
Each page has 2 scores: eBay: 3
• Quality as an expert (hub): Yahoo: 3
• Total sum of votes of authorities pointed to
• Quality as a content (authority): CNN: 8
• Total sum of votes coming from experts
WSJ: 9
59
Hubs and Authorities
60
Counting in-links: Authority
Sum of hub
scores of nodes
pointing to NYT.
62
Expert Quality: Hub
Sum of authority
scores of nodes
that the node
points to.
Hubs collect
authority scores
63
Reweighting
64
Mutually Recursive Definition
65
Hubs and Authorities
• Each page 𝑖 has 2 scores: j1 j2 j3 j4
• Authority score: 𝒂𝒊
• Hub score: 𝒉𝒊
i
HITS algorithm:
(0) 𝒂𝒊 = 𝒉𝒋
• Initialize hj =1 𝒋→𝒊
𝑡+1
• Normalize: max 𝑎𝑖 =1
j1 j2 j3 j4
(𝑡+1) (𝑡)
• ∀𝒊: Hub: ℎ𝑖 = σ𝒊→𝒋 𝑎𝑗
𝒉𝒊 = 𝒂𝒋
𝑡+1 𝒊→𝒋
• Normalize: max ℎ𝑖 =1
66
Hubs and Authorities
• Then 𝒉𝒊 = σ𝒊→𝒋 𝒂𝒋
can be rewritten as 𝒉𝒊 = σ𝒋 𝑨𝒊𝒋 ⋅ 𝒂𝒋
So: 𝒉 = 𝑨 ⋅ 𝒂
67
Example of HITS Yahoo
111 110
A= 101 AT = 1 0 1
010 110 Amazon M’soft
69