[Feature] KVPress integration proposal

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

KV cache is expensive in long contexts. So far, I do not find KV cache compression options existing here. A related issue is #10585 , but it seems very high-level.

I am considering adding this feature here, with the first support on default setup on a single node (SnapKV + flashinfer + RadixCache) following the proposal below, SnapKV is selected as the POC since it's a typical method in KVPress and has the almost-best performance among them.

* migrate the essential code from KVPress, modify the hook to compress the SGLang KV cache by the compress ratio. `token_to_kv_pool` elements should be already compressed by applying this. Since the KVPress hook is designed based on the `torch.module` hook, and backends in SGLang do not follow this, we can choose one of following
  *  apply the compress methods in these backends, this is faster because the compressed KV directly go to the cache pool, but each backend needs to do compress
  * after the cache is in the pool by the backend, take it out and compress, then insert back, this has a more unified logic, but would be slightly slower
* update the `req_to_token` cache by the compressed tokens, the token number needs to be adjusted by the compress ratio.
* add server args and integration controls.

The short idea of KVPress is to compress the `[seq_len, n_head, n_dim]` to `[seq_len * compress_ratio, n_head, n_dim]` in prefill, the first version should be able to see the this change in `req_to_token` list

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] KVPress integration proposal #12607

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] KVPress integration proposal #12607

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions