Tags: li2zhi/vllm
Tags
[Bugfix] Disable TRTLLM attention when KV transfer is enabled (vllm-p… …roject#33192) Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (vl… …lm-project#33729) Signed-off-by: Nick Hill <nickhill123@gmail.com>
[torch.compile] Don't do the fast moe cold start optimization if ther… …e is speculative decoding (vllm-project#33624) Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> (cherry picked from commit 5eac9a1)
[Docs] Adding links and intro to Speculators and LLM Compressor (vllm… …-project#32849) Signed-off-by: Aidan Reilly <aireilly@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Revert "Enable Cross layers KV cache layout at NIXL Connector (vllm-p… …roject#30207)" (vllm-project#33241) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> (cherry picked from commit 2e8de86)
Relax protobuf library version constraints (vllm-project#33202) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> (cherry picked from commit a97b5e2)
[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_ke… …rnel when on gfx942 (vllm-project#32976) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>
[Bugfix] Fix Dtypes for Pynccl Wrapper (vllm-project#33030) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> (cherry picked from commit 43a013c)
PreviousNext