Skip to content

[BUG] [Qwen3-next] MPT+CG fail #24660

@vadiklyutiy

Description

@vadiklyutiy

When I tried MTP with random input from vllm bench serve I got the following fail

 vllm serve $MODEL -tp 4 --served-model-name qwen3-next --tokenizer-mode auto --speculative-config '{"method": "qwen3_next_mtp", "num_speculative_tokens": 2}'
vllm bench serve   --backend vllm   --model $MODEL  --served-model-name qwen3-next  --endpoint /v1/completions   --dataset-name random   --random-input 2048   --random-output 1024   --max-concurrency 256   --num-prompt 256
(Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654]   File "/home/scratch.vgimpelson_ent/vllm_qwen/vllm/config/__init__.py", line 3380, in pad_for_cudagraph
(Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654]     return self.compilation_config.bs_to_padded_graph_size[batch_size]
(Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
(Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654] IndexError: list index out of range

The problem somewhere here

https://github.com/vllm-project/vllm/blob/main/vllm/v1/attention/backends/gdn_attn.py#L211-L215

        if (self.use_full_cuda_graph and num_prefills == 0 and num_decodes == 0
                and num_spec_decodes <= self.decode_cudagraph_max_bs):
            num_total_tokens = self.vllm_config.pad_for_cudagraph(
                m.num_actual_tokens)
            batch_size = num_total_tokens // (self.num_spec + 1)

during fail
num_spec_decodes = 228
m.num_actual_tokens = 228*3
self.decode_cudagraph_max_bs = 512

the if passed because 228 < 512, but self.vllm_config.pad_for_cudagraph fails because 228*3 than cudagraph_max_bs

Originally posted by @vadiklyutiy in #24526 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions