Skip to content

hyowonwi/AGF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain

GitHub Repo stars arXiv alphaXiv alphaXiv


This is the official implementation of our IJCAI 2025 paper "Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain" (AGF).

📚 Related Works from Our Group

Venue Paper Code
NeurIPS 2024 Graph Convolutions Enrich the Self-attention in Transformers! GitHub
ICML 2024 Polynomial-based Self-Attention for Table Representation Learning GitHub
AAAI 2024 An Attentive Inductive Bias for Sequential Recommendation Beyond the Self-Attention GitHub
ICLR 2024 Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh Transformer GitHub

📌 TL;DR

We propose Attentive Graph Filter (AGF), a novel self-attention mechanism that interprets attention as learning graph filters in the singular value domain from the perspective of directed graph signal processing (GSP). AGF achieves linear complexity O(nd²) while effectively leveraging both low and high-frequency information, outperforming existing linear Transformers on various benchmarks.

🏗️ Method Overview

AGF Method Overview

Key Insight: Self-Attention is a Low-Pass Filter

Theorem 1. Let $M = \text{softmax}(Z)$ for any matrix $Z \in \mathbb{R}^{n \times n}$. Then $M$ inherently acts as a low pass filter.

This means vanilla self-attention attenuates high-frequency information, limiting the expressive power of Transformers.

Our Solution: Attentive Graph Filter (AGF)

AGF directly learns graph filters in the singular value domain: $AGF(X) = U(X) Σ(X) V(X)^{T} XW_v$ where:

  • $U(X) = \rho(XW_u) \in \mathbb{R}^{n \times d}$ (left singular vectors)
  • $Σ(X) = \sum_ k θ_k T_k(\text{diag}(\sigma(XW_s))) \in \mathbb{R}^{n \times d \times d}$ (filtered singular values)
  • $V(X)^{T} = \rho((XW_v)^T) \in \mathbb{R}^{d \times n}$ (right singular vectors)

Theorem 2. If the coefficient $\theta_k$ of a graph filter can have negative values and learned adaptively, the graph filter will pass low and high frequency signals appropriately.


⚙️ Installation & Quick Start

# Clone the repository
git clone https://github.com/hyowonwi/agf.git
cd agf

For detailed instructions on installation, dataset preparation, and running experiments, please refer to the README in each subdirectory:

Task Directory README
Long Range Arena (LRA) AGF_LRA/ 📖 AGF_LRA/README.md
UEA Time Series Classification AGF_UEA/ 📖 AGF_UEA/README.md

📝 Citation

If you find this work useful, please cite our paper:

@inproceedings{wi2025agf,
  title     = {Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain},
  author    = {Wi, Hyowon and Choi, Jeongwhan and Park, Noseong},
  booktitle = {Proceedings of the Thirty-Fourth International Joint Conference on
               Artificial Intelligence, {IJCAI-25}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  pages     = {6561--6569},
  year      = {2025},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2025/730},
  url       = {https://doi.org/10.24963/ijcai.2025/730},
}

If you have any questions, please open an issue or contact us at hyowon.wi@kaist.ac.kr or jeongwhan.choi@kaist.ac.kr

⭐ Star this repository if you find it helpful!

About

[IJCAI'25] Official PyTorch Implementation of "Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain" (AGF)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors