Prism: Spectral-Aware Block-Sparse Attention

Overview

Prism is a training-free method to accelerate long-context LLM pre-filling. It addresses the "blind spot" in standard mean pooling caused by Rotary Positional Embeddings (RoPE) by disentangling attention into high-frequency and low-frequency bands.

Key Features:

Dual-Band Importance Estimation: Separates semantic (low-freq) and positional (high-freq) signals.
Energy-Based Calibration: Restores attenuated signals automatically.
Speed: Up to 5.1× speedup on 128K context with negligible accuracy loss.
Implementation: purely block-level operations with custom Triton kernels.

Repository Structure

prism/
- prism.py: Core implementation of Prism.
- kernels/: Custom Triton kernels for efficient block importance estimation with Prism and block-sparse attention.
baselines: Baseline implementations (e.g., MInference, FlexPrefill, Xattention).
eval/: Evaluation harnesses.
data/: Example data for demonstration.
scripts/: Shell scripts to reproduce the experiments in the paper.

Installation

# For core Prism implementation only
uv pip install -e .

# For lm-eval evals
uv pip install -e "eval/lm-evaluation-harness["hf", "longbench", "ruler"]"

# For lmms-eval evals
uv pip install -e "eval/lmms-eval["qwen", "metrics"]"

# For baselines
# FlashAttention
uv pip install flash_attn --no-build-isolation
# Minference
uv pip install minference
# XAttention
git clone git@github.com:mit-han-lab/Block-Sparse-Attention.git
cd Block-Sparse-Attention && uv pip install -e .

Example Usage

A simple example Prism using Qwen3-0.6B with a RULER example.

python -m prism.prism

Alternatively, patch your model with Prism for broader usage:

from prism import prism_attention_forward
from prism.utils.patch import apply_patch
from transformers import AutoModelForCausalLM

apply_patch(
    forward_fn=prism_attention_forward,
    model_id="Qwen/Qwen3-8B",
)

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")

Evaluation

To reproduce the evaluation results, please refer to the scripts in the scripts directory.

# Example: Running LongBench Evaluation on Qwen3-8B
bash scripts/longbench.sh

Note: For RULER evaluation, we use Qwen3 with YaRN extrapolation, consistent with the official implementation. Please ensure your MODEL_ID points to a model path containing the modified config.json required for long-context processing.

Citation

If you find this work helpful, please consider citing our paper as following:

@misc{wang2026prismspectralawareblocksparseattention,
      title={Prism: Spectral-Aware Block-Sparse Attention}, 
      author={Xinghao Wang and Pengyu Wang and Xiaoran Liu and Fangxu Liu and Jason Chu and Kai Song and Xipeng Qiu},
      year={2026},
      eprint={2602.08426},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.08426}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
baselines		baselines
data		data
eval		eval
prism		prism
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prism: Spectral-Aware Block-Sparse Attention

Overview

Repository Structure

Installation

Example Usage

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prism: Spectral-Aware Block-Sparse Attention

Overview

Repository Structure

Installation

Example Usage

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages