-
Notifications
You must be signed in to change notification settings - Fork 150
Description
Hi, Thanks for sharing the code!
I have a few questions about Section 3.1.2. Duplex attention.
-
I am confused by the notation in the section. For example, in this section, "
Y=(K^{P\times d}, V^{P\times d}), where the values store the content of the Y variables (e.g. the randomly sampled latents for the case of GAN)". Does it mean that V^{P\times d} is sampled from the original variable Y? how to set the number of P in your code? -
"keys track the centroids of the attention-based assignments from X to Y, which can be computed as
K=a_b(Y, X)", does it mean K is calculated by using the self-attention module but with (Y, X) as input? If so, how to understand “the keys track the centroid of the attention-based assignments from X to Y”? BTW, how to get the centroids? -
For the update rule in duplex attention, what does the
a()function mean? Does it denote a self-attention module likea_b()in Section 3.1.1, where X as query, K as keys, and V as values, if so, K is calculated from another self-attention module as mentioned in question 2, so the output ofa_b(Y, X)will be treated as Keys, so the update rule contains two self-attention operations? is that right? Does it mean ’Duplex‘ attention? -
But finally I find I may be wrong when I read the last paragraph in this section. As mentioned in this section, "to support bidirectional interaction between elements, we can chain two reciprocal simplex attentions from X to Y and from Y to X, obtaining the duplex attention" So, does it mean, first, we calculate the Y by using a simplex attention module
u^a(Y, X), and then use this Y as input ofu^d(X, Y)to update X? Does it mean the duplex attention module contains three self-attention operations?
Thanks a lot! :)