0% found this document useful (0 votes)
21 views16 pages

DRDO PPT m1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

DRDO PPT m1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unsupervised Representation Learning methods

for Knowledge Graph

Chandankumar G P

M Tech AI
Indian Institute of Science
Bangalore

Prof. Sundeep Prabhakar Chepuri

November 28, 2023

Chandankumar G P 1/16
Outline of Discussion :

Contrastive Learning.

Contrastive Learning with Adaptive Graph Augmentation.

Canonical Correlation Analysis based Contrastive learning.

Scattering Transforms.

Chandankumar G P 2/16
Graph representation learning :

Aim of graph representation learning is to learn effective


representations of graphs.

The main goal of graph representation learning is to map each


node in a graph to a vector representation in a continuous
vector space, commonly referred to as an embedding.

Some methods of representation learning are Node2Vec ,


Graph Neural Networks (GNNs) , Graph Autoencoders.

Some applications are Node Classification, Link Prediction,


Graph Clustering and Community Detection, etc.

Chandankumar G P 3/16
Contrastive Learning:
Contrastive learning is a selfsupervised representation learning
method
Contrastive learning in computer vision

Figure 1

Chandankumar G P 4/16
Contrastive learning in graphs
Get two diffrent views from a single graphs and learn better
representations

Figure 2

Encode using encoders and use contrastive loss to learn better


representations

Chandankumar G P 5/16
Contrastive Learning framework:
Let G = (V, E) denote a graph, where V = {v1 , v2 , ......., vN }
and E ∈ V × V represent the node set and the edge set
respectively , X ∈ RN ×F , A ∈ {0, 1}N ×N denote the feature
matrix and the adjacency matrix respectively .

Our objective is to learn a GNN encoder f (X, A) ∈ RN ×F
that produces node embeddings in low dimensionality.

Figure 3: Illustrative model

Chandankumar G P 6/16
Loss function: for each positive pair (ui , vi )
θ(u ,v )/τ
ℓ(ui , vi ) = log θ(u ,v )/τ X eθ(u i,v i)/τ X θ(u ,u )/τ
e| {z } +
i i
e i k + e i k
positive pair k̸=i k̸=i
| {z } | {z }
inter-view negative pairs intra-view negative pairs
where τ is a temperature parameter,θ(u, v) = s(g(u), g(v)) ,
where s(., .) is the cosine similarity and g(.) is a nonlinear
projection (implemented with a two-layer perceptron model).

The overall objective to be maximized is the average over all


positive P
pairs
1 N
L = 2N i=1 [ℓ(ui , vi ) + ℓ(ui , vi )]

Chandankumar G P 7/16
Adaptive Graph Augmentation:

This augmentation scheme tend to keep important structures


and attributes unchanged, while perturbing possibly
unimportant links and features.
T opology level augmentation : we sample a modified
subset Ẽ from the original E with probability
P ((u, v) ∈ Ẽ) = 1 − peuv
1 − peuv should reflect the importance of (u, v)
We define edge centrality as the average of two adjacent
e = (ϕ (u) + ϕ (v))/2
node’s centrality scores i.e., wuv c c
On directed graph, we simply use the centrality of the tail i.e.,
e = ϕ (v)
wuv c

seuv = log(wuv
e )

e −suve
peuv = min( ssmax
e −µe .pe , pτ )
max s

Chandankumar G P 8/16
We can use Degree centrality , Eigenvector centrality or
PageRank centrality
N ode attribute level augmentation : We add noise to
node attributes via randomly masking a fraction of dimensions
with zeros in node features. with probability pfi
the probability pfi should reflect the importance of the i-th
dimension of node features.
For each feature dimension we calculate weights as
wif = u∈V |xui |.ϕc (u)
P

We compute probabilty as
seuv = log(wuv
e )

f f
pfi = min( smax
f
−suv
f .pf , pτ )
smax −µs

Chandankumar G P 9/16
Canonical Correlation Analysis based Contrastive learning:

This introduces a non-contrastive and non-discriminative


objective for self-supervised learning, which is inspired by
Canonical Correlation Analysis methods.
Canonical Correlation Analysis: For two random variables
xP∈ Rm and y ∈ Rn , their covariance matrix is
xy = Cov(x, y)

CCA aims at seeking two vectors a ∈ Rm and b ∈PRn such


aT xy b
that the correlation, ρ =corr(aT x, bT y)= √ q P
aT xy a bT xy b
P

is maximized

Objective is: maxa,b aT s.t aT = bT


P P P
xy b xy a xy b =1

Chandankumar G P 10/16
By replacing the linear transformation with neural networks.
Concretely, assuming x1, x2 as two views of an input data.
objective is: maxθ1 ,θ2 Tr(PθT1 (x1)Pθ2 (x2)) s.t
PθT1 (x1)Pθ1 (x1) = PθT2 (x2)Pθ2 (x2) =I
where Pθ1 and Pθ2 are two feedforward neural networks and I
is an identity matrix.
still such computation is really expensive and soft CCA
removes the hard decorrelation constraint by adopting the
following Lagrangian relaxation:
minθ1 ,θ2
Ldist (Pθ1 (x1), Pθ2 (x2))+λ(LSDL (Pθ1 (x1)) + LSDL (Pθ2 (x2)))
Ldist measures correlation between two views representations
and LSDL called stochastic decorrelation loss
2 2 2
L = ∥Z̃A − Z̃B ∥F +λ (∥Z̃TA Z̃A − I∥F + ∥Z̃TB Z̃B − I∥F )
| {z } | {z }
invariance term decorrelation term

Chandankumar G P 11/16
Geometric Scattering Transform for Graph Data

Transfer learning approaches provide evidence that suitably


constructed deep filter banks should be able to extract
task-agnostic semantic information from the data.

The geometric scattering transform is a deep filter bank for


feature extraction on graphs, which generalizes the Euclidean
scattering transform.

The scattering transform is constructed as a cascade of linear


wavelet transforms and nonlinear complex modulus operations
that provides features with guaranteed invariance to a
predetermined Lie group of operations such as rotations,
translations, or scaling.

Chandankumar G P 12/16
Graph Wavelets

Graph wavelets are defined as the difference between lazy


random walks that have propagated at different time scales.
Let G = (V, E, W ) be a weighted graph, consisting of n
vertices V = {v1 , ..., vn },edges E ⊆ {(vl , vm ) : 1 ≤ l, m, < n}
and weights W = {w (vl , vm ) > 0 : (vl , vm ) ∈ E} .
P
Weighted degree of vertex vl is deg (vl ) = m A (vl , vm ) and
the corresponding diagonal n × n degree matrix D is given by
D (vl , vl ) = deg (vl ) , D (vl , vm ) = 0, l ̸= m.
The n × n graph Laplacian matrix LG = L on G is define as
−1 −1
L = D − A and normalized Laplacian is N = D 2 LD 2
−1 −1
= I − D 2 AD 2

Chandankumar G P 13/16
The n × n lazy random walk matrix is P = 21 I + AD−1


and Pt governs the probability distribution of a lazy random


walk after t steps.
The value Pt x (vl ) is the weighted average of x (vl ) with all
values x (vm ) such that vm is within t steps of vl .
The n × n wavelet matrix at the scale 2j is defined as
j−1 j j−1
 j−1

Ψj = P2 − P2 = P2 I − P2 .

The wavelet coefficients up to the scale 2J are:

Ψ(J) x (vl ) = [Ψj x (vl ) : 1 ≤ j ≤ J] .

Chandankumar G P 14/16
Geometric Scattering on Graphs :

The zero order scattering moments is given by the


unnormalized q th moments of x given by.
n
X
Sx (q) = x (vl )q , 1 ≤ q ≤ Q.
l=1

First order geometric scattering moments is given by:


n
X
Sx (j, q) = |Ψj x (vl )|q , 1 ≤ j ≤ J, 1 ≤ q ≤ Q.
l=1

Second order geometric scattering moments:


n
 ′  X q ′
Sx j, j , q = Ψj ′ |Ψj x (vl )| , 1 ≤ j < j ≤ J, 1 ≤ q ≤ Q.
l=1

Chandankumar G P 15/16
Thank You

Chandankumar G P 16/16

You might also like