Skip to content

[Feature] Overlap Spec Support #11762

@hnyls2002

Description

@hnyls2002

Motivation

We have already implemented the initial support for eagle speculative decoding with the overlap scheduler, and here is the roadmap for more feature optimizations and support. The initial skeleton code is this PR #11398

The design illustration is here

Note

The arg --enable-beta-spec has been deprecated, please use export SGLANG_ENABLE_SPEC_V2=1 to enable this feature.


page size & topk support

memory allocation

  • over-allocation optimization @hnyls2002
  • over-allocation with page size > 1 + topk > 1

Attention backend support

sampling

  • xgrammar support
  • penalty support
  • logprob support

speculative methods

DP attention support

  • Support idle batch @ch-wan
  • cover testcases with dp-attention + overlap + spec @ch-wan

EP support

  • Check compatibility with DeepEP / EP @fzyzcjy
  • Cover testcases with EP + overlap + spec @fzyzcjy

PD disaggregation

Aggressive Optimizations

  • Enable a separate plan_stream

Related resources

No response

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions