[BUG] [Qwen3-next] MPT+CG fail

> When I tried MTP with random input from `vllm bench serve` I got the following fail
> 
> ```
>  vllm serve $MODEL -tp 4 --served-model-name qwen3-next --tokenizer-mode auto --speculative-config '{"method": "qwen3_next_mtp", "num_speculative_tokens": 2}'
> ```
> 
> ```
> vllm bench serve   --backend vllm   --model $MODEL  --served-model-name qwen3-next  --endpoint /v1/completions   --dataset-name random   --random-input 2048   --random-output 1024   --max-concurrency 256   --num-prompt 256
> ```
> 
> ```
> (Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654]   File "/home/scratch.vgimpelson_ent/vllm_qwen/vllm/config/__init__.py", line 3380, in pad_for_cudagraph
> (Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654]     return self.compilation_config.bs_to_padded_graph_size[batch_size]
> (Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
> (Worker_TP3 pid=2417047) ERROR 09-11 13:34:02 [multiproc_executor.py:654] IndexError: list index out of range
> ```

The problem somewhere here 

https://github.com/vllm-project/vllm/blob/main/vllm/v1/attention/backends/gdn_attn.py#L211-L215

```
        if (self.use_full_cuda_graph and num_prefills == 0 and num_decodes == 0
                and num_spec_decodes <= self.decode_cudagraph_max_bs):
            num_total_tokens = self.vllm_config.pad_for_cudagraph(
                m.num_actual_tokens)
            batch_size = num_total_tokens // (self.num_spec + 1)
```
during fail 
`num_spec_decodes = 228`
`m.num_actual_tokens = 228*3`
`self.decode_cudagraph_max_bs = 512`

the if passed because 228 < 512, but self.vllm_config.pad_for_cudagraph fails because 228*3 than `cudagraph_max_bs`

_Originally posted by @vadiklyutiy in https://github.com/vllm-project/vllm/issues/24526#issuecomment-3280378378_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] [Qwen3-next] MPT+CG fail #24660

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] [Qwen3-next] MPT+CG fail #24660

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions