-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Beta spec-overlap for EAGLE #11398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beta spec-overlap for EAGLE #11398
Conversation
Docstrings generation was requested by @JustinTong0323. * #11398 (comment) The following files were modified: * `python/sglang/srt/layers/attention/base_attn_backend.py` * `python/sglang/srt/layers/attention/triton_backend.py` * `python/sglang/srt/layers/logits_processor.py` * `python/sglang/srt/managers/overlap_utils.py` * `python/sglang/srt/managers/schedule_batch.py` * `python/sglang/srt/managers/scheduler.py` * `python/sglang/srt/managers/scheduler_metrics_mixin.py` * `python/sglang/srt/managers/scheduler_output_processor_mixin.py` * `python/sglang/srt/managers/tp_worker.py` * `python/sglang/srt/model_executor/cuda_graph_runner.py` * `python/sglang/srt/model_executor/forward_batch_info.py` * `python/sglang/srt/server_args.py` * `python/sglang/srt/speculative/eagle_info.py` * `python/sglang/srt/speculative/eagle_info_v2.py` * `python/sglang/srt/speculative/eagle_worker_v2.py` * `python/sglang/srt/speculative/spec_utils.py` * `test/srt/test_eagle_infer_beta.py`
|
Great Work ! I observed a significant performance improvement using Triton backend, and I would like to know if there are any plan to support more attention backend like fa3 and flashinfer? |
|
Could you explain why this plan_stream is used? Will there be any problems if forward_stream is used directly? |
To gain more acceleration with a dual stream. Currently disabled by #11724. |
|
@Ximingwang-09 , welcome to try #12128. |
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Initial results
using
python -m sglang.test.send_oneBeta Eagle:
Main Eagle:
The current implementation completely separates the beta code path from the main code path, and more features and optimizations will come soon (also a road map and a design doc)
Summary by CodeRabbit
New Features
export SGLANG_ENABLE_SPEC_V2=1to opt into beta speculative behavior.Performance
Tests