Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain

Hyowon Wi¹, Jeongwhan Choi¹, Noseong Park¹,

¹KAIST

This is the official implementation of our IJCAI 2025 paper "Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain" (AGF).

📚 Related Works from Our Group

Venue	Paper	Code
NeurIPS 2024	Graph Convolutions Enrich the Self-attention in Transformers!
ICML 2024	Polynomial-based Self-Attention for Table Representation Learning
AAAI 2024	An Attentive Inductive Bias for Sequential Recommendation Beyond the Self-Attention
ICLR 2024	Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh Transformer

📌 TL;DR

We propose Attentive Graph Filter (AGF), a novel self-attention mechanism that interprets attention as learning graph filters in the singular value domain from the perspective of directed graph signal processing (GSP). AGF achieves linear complexity O(nd²) while effectively leveraging both low and high-frequency information, outperforming existing linear Transformers on various benchmarks.

🏗️ Method Overview

Key Insight: Self-Attention is a Low-Pass Filter

Theorem 1. Let $M = \text{softmax}(Z)$ for any matrix $Z \in \mathbb{R}^{n \times n}$. Then $M$ inherently acts as a low pass filter.

This means vanilla self-attention attenuates high-frequency information, limiting the expressive power of Transformers.

Our Solution: Attentive Graph Filter (AGF)

AGF directly learns graph filters in the singular value domain: $AGF(X) = U(X) Σ(X) V(X)^{T} XW_v$ where:

$U(X) = \rho(XW_u) \in \mathbb{R}^{n \times d}$ (left singular vectors)
$Σ(X) = \sum_ k θ_k T_k(\text{diag}(\sigma(XW_s))) \in \mathbb{R}^{n \times d \times d}$ (filtered singular values)
$V(X)^{T} = \rho((XW_v)^T) \in \mathbb{R}^{d \times n}$ (right singular vectors)

Theorem 2. If the coefficient $\theta_k$ of a graph filter can have negative values and learned adaptively, the graph filter will pass low and high frequency signals appropriately.

⚙️ Installation & Quick Start

# Clone the repository
git clone https://github.com/hyowonwi/agf.git
cd agf

For detailed instructions on installation, dataset preparation, and running experiments, please refer to the README in each subdirectory:

Task	Directory	README
Long Range Arena (LRA)	`AGF_LRA/`	📖 AGF_LRA/README.md
UEA Time Series Classification	`AGF_UEA/`	📖 AGF_UEA/README.md

📝 Citation

If you find this work useful, please cite our paper:

@inproceedings{wi2025agf,
  title     = {Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain},
  author    = {Wi, Hyowon and Choi, Jeongwhan and Park, Noseong},
  booktitle = {Proceedings of the Thirty-Fourth International Joint Conference on
               Artificial Intelligence, {IJCAI-25}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  pages     = {6561--6569},
  year      = {2025},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2025/730},
  url       = {https://doi.org/10.24963/ijcai.2025/730},
}

If you have any questions, please open an issue or contact us at hyowon.wi@kaist.ac.kr or jeongwhan.choi@kaist.ac.kr

⭐ Star this repository if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
AGF_LRA		AGF_LRA
AGF_UEA		AGF_UEA
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain

📚 Related Works from Our Group

📌 TL;DR

🏗️ Method Overview

Key Insight: Self-Attention is a Low-Pass Filter

Our Solution: Attentive Graph Filter (AGF)

⚙️ Installation & Quick Start

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain

📚 Related Works from Our Group

📌 TL;DR

🏗️ Method Overview

Key Insight: Self-Attention is a Low-Pass Filter

Our Solution: Attentive Graph Filter (AGF)

⚙️ Installation & Quick Start

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages