-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededhigh priority
Description
Motivation
We have already implemented the initial support for eagle speculative decoding with the overlap scheduler, and here is the roadmap for more feature optimizations and support. The initial skeleton code is this PR #11398
The design illustration is here
Note
The arg --enable-beta-spec has been deprecated, please use export SGLANG_ENABLE_SPEC_V2=1 to enable this feature.
page size & topk support
- Support page size > 1 @cicirori @hnyls2002 [overlap-spec] support page size > 1 #11772
- Support topk > 1 @vincentzed [eagle overlap spec] wip impl top k > 1 in overlap eagle worker(v2) #11839
- Support topk > 1 + page size > 1 @vincentzed
memory allocation
- over-allocation optimization @hnyls2002
- over-allocation with page size > 1 + topk > 1
Attention backend support
- Remove or make
verify_done.synchronize()an option @hnyls2002 - Different attention backend support @Fridge003 @Qiaolin-Yu
sampling
- xgrammar support
- penalty support
- logprob support
speculative methods
- new speculative model worker interface (Abstraction for spec worker and code cleanup #11643)
- standalone speculative support @Qiaolin-Yu
- ngram speculative support @a4zhangfei
- Top
SpecTpWorkerfor all speculative decoding backends @hnyls2002 - Make
SpecTpWorkercompatible with allTpModelWorkerfeatures. - specialize for high throughput case (num_step=1, topk=1, num_verify_forward_pass_tokens=2) @yukavio
DP attention support
EP support
PD disaggregation
- Event loop adjust in Prefill / Decode worker @shaharmor98
- Cover testcases with PD-Disagg + overlap + spec @ShangmingCai
Aggressive Optimizations
- Enable a separate
plan_stream
Related resources
No response
Swipe4057, neelabhsinha, zhyncs, cicirori, Qiaolin-Yu and 11 more
Metadata
Metadata
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededhigh priority