Skip to content

[TRTLLM-12436][feat] visual_gen: add CuTe DSL attention via exported binaries#13721

Merged
zhenhuaw-me merged 3 commits into
NVIDIA:mainfrom
xrq-phys:ruqingx/visual_gen/qk16pv8_precompiled
May 29, 2026
Merged

[TRTLLM-12436][feat] visual_gen: add CuTe DSL attention via exported binaries#13721
zhenhuaw-me merged 3 commits into
NVIDIA:mainfrom
xrq-phys:ruqingx/visual_gen/qk16pv8_precompiled

Conversation

@xrq-phys

@xrq-phys xrq-phys commented May 4, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added CuTe DSL attention backend support for visual generation models on Blackwell GPUs.
    • Introduced --quant_attention_mode option with NO_QUANT, QK16PV8, and SAGE modes, replacing the previous --enable_sage_attention flag.
    • Extended --attention_backend to include CUTEDSL choice alongside existing options.
  • Documentation

    • Updated visual generation examples with CuTe DSL backend usage and new quantization mode configurations.
  • Tests

    • Added new tests for CuTe DSL attention kernel execution and validation.
    • Expanded existing attention tests to cover new backends and quantization configurations.

Review Change Stack

Description

Adds CuTe DSL fmha.py exported cubins to the visual_gen workflow as backend option CUTEDSL, supporting:

  • Ragged attention on 16-bit datatypes
  • Ragged attention with Qk16Pv8 (Q & K in BF16, V in FP8, online P converted to FP8)

Currently, only head_dim=128 is supported due to downstream model requirements.

Integrated into Wan & FLUX.

Test Coverage

  • Added test_attention_cute_dsl.py for smoke tests CuTe DSL
  • Added tests for the integration to test_attention_integration.py and test_attention_perf.py

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@xrq-phys xrq-phys requested review from a team as code owners May 4, 2026 08:07
@xrq-phys xrq-phys requested review from arysef and nv-guomingz May 4, 2026 08:07
@xrq-phys

xrq-phys commented May 4, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46628 [ run ] triggered by Bot. Commit: 8ba9953 Link to invocation

@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor
ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 91336eb6-eb7f-42ab-a9fd-604116991391

📥 Commits

Reviewing files that changed from the base of the PR and between 546a5b0 and 5fd65e1.

⛔ Files ignored due to path filters (64)
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
📒 Files selected for processing (21)
  • .gitattributes
  • .gitignore
  • examples/visual_gen/README.md
  • examples/visual_gen/visual_gen_flux.py
  • examples/visual_gen/visual_gen_wan_i2v.py
  • examples/visual_gen/visual_gen_wan_t2v.py
  • setup.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/__init__.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/cute_dsl.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/__init__.py
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/__init__.py
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/__init__.py
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/fmha.py
  • tensorrt_llm/visual_gen/args.py
  • tests/integration/test_lists/test-db/l0_b200.yml
  • tests/unittest/_torch/visual_gen/test_attention_cute_dsl.py
  • tests/unittest/_torch/visual_gen/test_attention_integration.py
  • tests/unittest/_torch/visual_gen/test_attention_perf.py
  • tests/unittest/_torch/visual_gen/test_visual_gen_args.py

📝 Walkthrough

Walkthrough

This PR introduces a new CuTe DSL FMHA attention backend for visual generation models with quantization mode support. It adds configuration schema supporting backend-specific quantization recipes, implements CuTeDSLAttention with variable-length and optional QK16PV8 quantization, integrates kernel loading infrastructure, updates CLI examples and tests for comprehensive coverage.

Changes

CuTe DSL Backend and Configuration Schema

Layer / File(s) Summary
Configuration schema and backend-specific validation
tensorrt_llm/visual_gen/args.py
QuantAttentionConfig expands to support bf16 dtype and 0-based block sizes; AttentionConfig adds CUTEDSL backend; validator refactored to enforce SAGE_RECIPES for TRTLLM and QK16PV8_DTYPES for CUTEDSL.
CuTeDSLAttention implementation
tensorrt_llm/_torch/visual_gen/attention_backend/cute_dsl.py
New attention backend enforcing head_dim==128, with input preparation (dtype casting), optional QK16PV8 V-quantization, variable-length indirection tensors, and LSE output support.
FMHA kernel discovery and execution
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/fmha.py
Cubin discovery for Blackwell GPUs, dtype-to-variant mapping with float8, cubin resolution and caching, main cute_dsl_fmha_fwd dispatcher supporting both TVM-FFI and CuTe tensor paths.
Backend factory integration
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py, tensorrt_llm/_torch/visual_gen/attention_backend/__init__.py
Lazy imports and factory wiring for CuTeDSLAttention; docstring updates documenting CUTEDSL support and quantization config forwarding.
Package configuration
.gitattributes, .gitignore, setup.py, package __init__.py files
Git LFS tracking for cubins, negation rule for cubins directory, cubin artifacts added to wheel package_data, copyright year updated to 2026.

User-Facing API and Examples

Layer / File(s) Summary
CLI quantization mode
examples/visual_gen/visual_gen_flux.py, examples/visual_gen/visual_gen_wan_i2v.py, examples/visual_gen/visual_gen_wan_t2v.py
Added --quant_attention_mode argument (NO_QUANT/QK16PV8/SAGE), extended --attention_backend to include CUTEDSL, removed --enable_sage_attention flag, integrated mode selection into VisualGenArgs.
Documentation updates
examples/visual_gen/README.md, tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py
Updated examples with CUTEDSL/QK16PV8 usage; expanded argument reference tables; documented CUTEDSL as LSE-capable backend for Attention2D parallelism.

Test Suite Updates

Layer / File(s) Summary
Integration test parametrization
tests/unittest/_torch/visual_gen/test_attention_integration.py
Extended create_model_config with quant_attention_config; added _require_attention_backend helper with GPU SM gating; parametrized self/cross-attention and WAN-shape tests over (head_dim, backend, quant config) combinations.
CuTe DSL kernel smoke tests
tests/unittest/_torch/visual_gen/test_attention_cute_dsl.py
New test module with CUDA/GPU-arch gating (sm_100a/sm_103a); helpers for variable-length indirection tensors and float8-aware tensor generation; reference SDPA with GQA head replication; kernel loading and forward execution smoke tests.
Performance benchmark updates
tests/unittest/_torch/visual_gen/test_attention_perf.py
Added CuTe DSL availability detection; extended WanAttentionPerformanceBenchmark methods to propagate quant_attention_config; parametrized self-attention tests over (head_dim, backend, quant config).
Configuration validation tests
tests/unittest/_torch/visual_gen/test_visual_gen_args.py
Updated backend rejection test to expect "requires backend in …" error; added test cases for TRTLLM/SAGE/CUTEDSL quantization config validation.
Test registry
tests/integration/test_lists/test-db/l0_b200.yml
Added test_attention_cute_dsl.py to Visual Gen test list; reordered existing attention tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#14175: Updates VisualGen args.py configuration and validation infrastructure that this PR extends with CUTEDSL quantization recipe support.

Suggested reviewers

  • arysef
  • nv-guomingz
  • yuxianq
  • bobboli
  • laikhtewari
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title accurately summarizes the main change: adding CuTe DSL attention support via precompiled CUTLASS binaries for visual_gen workflows, which is the primary focus across all modified files.
Description check ✅ Passed The PR description covers the essential aspects: what is being added (CuTe DSL fmha support with Qk16Pv8 variant), how it works (ragged attention, binaries via FA4), and test coverage details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py`:
- Around line 123-127: In the QK16PV8 branch (check
self.context_quantization_mode == "QK16PV8"), guard against a zero max by
clamping the divisor when computing v_qscale: compute max_abs =
max(v.abs().amax(), tiny_eps) (or use torch.clamp_min) before doing v_qscale =
448.0 / max_abs, then multiply v by v_qscale and adjust scale_v as before;
update references to v_qscale, v, and scale_v accordingly so v * v_qscale never
multiplies by infinity when v is all zeros.
- Around line 35-43: Split the combined import try-except so _cute_precompiled
and _flash_attn_fwd are imported in two independent try/except blocks, storing
each exception separately (e.g. _cute_precompiled_import_error and
_flash_attn_fwd_import_error) and leaving the other symbol intact if one import
fails; remove the early guard that raises when _flash_attn_fwd is missing and
instead let _fwd_precompiled() attempt the precompiled path first, then after
_fwd_precompiled() returns None check both _flash_attn_fwd and _cute_precompiled
and raise only if both are None (include the respective import errors in the
raised message for diagnostics).

In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 64-74: The _validate_context_quantization_for_backend_support
model validator in AttentionConfig currently silently downgrades
context_quantization_mode when an unsupported backend is used; instead, detect
when context_quantization_mode == "QK16PV8" and backend != "FA4" and raise a
ValueError with a clear message that includes the requested
context_quantization_mode and backend; keep the `@model_validator`(mode="after")
on _validate_context_quantization_for_backend_support and return self only on
success (do not mutate context_quantization_mode).

In `@tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/fmha.py`:
- Around line 224-241: _check_inputs currently misses validating batch-size
agreement for 4D tensors and does not verify all tensors are CUDA/on the same
device; update _check_inputs to (1) require that when q.dim() == 4, k.dim() and
v.dim() are also 4 and that q.shape[0] == k.shape[0] == v.shape[0] (so
non-varlen path uses consistent batch_size), and (2) assert that q, k, v, o are
CUDA tensors on the same device (compare tensor.device and tensor.device.type ==
"cuda" for each), raising ValueError if any check fails; preserve the existing
shape/dtype checks and the final return q.dim() == 3.
- Around line 269-273: When varlen is true, extend the simple numel() checks on
qo_indptr and kv_indptr to validate their contents and devices: ensure qo_indptr
and kv_indptr are on the same device as their corresponding Q/KV tensors, have
at least two elements, are non-decreasing and start with 0, and that
qo_indptr[-1] == total_q and kv_indptr[-1] == total_kv (use the actual total
token counts for Q/KV from the buffers) so offsets cannot walk past the buffers;
raise clear ValueError messages mentioning qo_indptr/kv_indptr when any of these
invariants fail.
- Around line 407-427: The launch uses torch.cuda.current_stream() without
specifying the device, which can pick the wrong GPU in multi-device setups;
change the stream acquisition to pass the device owning the tensors (e.g., use
torch.cuda.current_stream(device=q_device)) by deriving the device from the
input cute tensors (referencing q_cute/k_cute/v_cute/o_cute) and pass that
stream into kernel_fn (the call to kernel_fn should keep all args the same but
replace torch.cuda.current_stream().cuda_stream with
torch.cuda.current_stream(device=<device_from_q_cute>).cuda_stream so the kernel
launches on the device that owns q/k/v/o).

In `@tests/unittest/_torch/visual_gen/test_attention_cute_precompiled.py`:
- Around line 28-37: The test only loads a varlen=True kernel but runtime calls
(FlashAttn4Attention._fwd_precompiled) request varlen=False; add a non-varlen
precompiled-kernel case by invoking get_cute_dsl_fmha_kernel with varlen=False
(and the same dtype/arch parameters used for varlen) and include a smoke test
that exercises the non-varlen forward API path so the test suite will fail if
the non-varlen binary is missing or mispackaged; ensure both the varlen=True and
varlen=False kernels are loaded in this test (and mirror the same addition for
the other block around lines 205-217).

In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 216-224: The FA4/QK16PV8 test cases can pass without exercising
the precompiled path because the code falls back from _fwd_precompiled() to
_fwd(); modify the test to assert that the precompiled dispatch was actually
used by instrumenting or patching the precompiled loader/dispatch entry (e.g.,
monkeypatch the module/function that returns the precompiled implementation or
wrap _fwd_precompiled()) so it records or raises if called, and then require
that calls for FA4/QK16PV8 hit that instrumented path (fail or skip the test if
the instrumented signal/counter is not observed); target functions to
patch/assert are the attention precompiled loader/dispatcher and the methods
_fwd_precompiled() and _fwd() used by the FA4 backend so the test fails unless
the precompiled branch executed.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Around line 666-675: The parametrized test_self_attention_perf currently
passes QK16PV8 as context_quantization_mode but still runs timings when the
runtime falls back to plain FA4; update the test to detect which attention
kernel actually executed (e.g., by reading the kernel selection flag/counter or
instrumentation provided by the attention runtime) and explicitly skip or fail
the test if the selected backend is not the precompiled QK16PV8 path; modify the
test_self_attention_perf (and the analogous perf tests around the other
attention perf cases) to assert the kernel selection before measuring so
benchmarks only run when the intended QK16PV8 kernel is confirmed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 379b180a-d8bb-4b93-ab58-b1c8bef88a76

📥 Commits

Reviewing files that changed from the base of the PR and between f504047 and 8ba9953.

⛔ Files ignored due to path filters (192)
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
📒 Files selected for processing (15)
  • .gitattributes
  • .gitignore
  • examples/visual_gen/README.md
  • examples/visual_gen/visual_gen_flux.py
  • examples/visual_gen/visual_gen_wan_i2v.py
  • examples/visual_gen/visual_gen_wan_t2v.py
  • setup.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
  • tensorrt_llm/_torch/visual_gen/config.py
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/__init__.py
  • tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/fmha.py
  • tests/unittest/_torch/visual_gen/test_attention_cute_precompiled.py
  • tests/unittest/_torch/visual_gen/test_attention_integration.py
  • tests/unittest/_torch/visual_gen/test_attention_perf.py

Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py
Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py Outdated
Comment thread tensorrt_llm/_torch/visual_gen/config.py Outdated
Comment thread tensorrt_llm/_torch/visual_gen/jit_kernels/cute_dsl/fmha.py Outdated
Comment thread tests/unittest/_torch/visual_gen/test_attention_cute_precompiled.py Outdated
Comment thread tests/unittest/_torch/visual_gen/test_attention_integration.py
Comment thread tests/unittest/_torch/visual_gen/test_attention_perf.py Outdated
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46628 [ run ] completed with state SUCCESS. Commit: 8ba9953
/LLM/main/L0_MergeRequest_PR pipeline #36673 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys

xrq-phys commented May 4, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46639 [ run ] triggered by Bot. Commit: 8ba9953 Link to invocation

@xrq-phys xrq-phys changed the title [None][feat] visual_gen: add CuTe DSL attention via exported binaries [TRTLLM-12436][feat] visual_gen: add CuTe DSL attention via exported binaries May 4, 2026
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46639 [ run ] completed with state SUCCESS. Commit: 8ba9953
/LLM/main/L0_MergeRequest_PR pipeline #36683 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 8ba9953 to 46f2588 Compare May 5, 2026 09:09
@xrq-phys

xrq-phys commented May 5, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46785 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46785 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #36807 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys

xrq-phys commented May 5, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46825 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #46825 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #36845 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys

xrq-phys commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #47015 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #47015 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #36990 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys

xrq-phys commented May 7, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #47082 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #47082 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #37053 completed with status: 'SUCCESS'

CI Report

Link to invocation

@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 46f2588 to 79572ac Compare May 15, 2026 12:47
@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from b24d303 to 041cc68 Compare May 26, 2026 07:59
@xrq-phys

Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "Remaining multi-GPU failures are irrelevant to visual_gen"

@xrq-phys

Copy link
Copy Markdown
Collaborator Author

PR_Github #50307 [ run ] completed with state SUCCESS. Commit: 07dbb30 /LLM/main/L0_MergeRequest_PR pipeline #39837 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

All single-GPU & visual_gen-related tests should have passed there.

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50319 [ skip ] triggered by Bot. Commit: 041cc68 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50319 [ skip ] completed with state SUCCESS. Commit: 041cc68
Skipping testing for commit 041cc68

Link to invocation

@chang-l chang-l left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update docs/source/models/visual-generation.md to accommodate this new attention backend?

Also, can we add a new section, either parallel to or as a subsection under quantization section, to describe our quantized attn variants: SAGE and QK16PV8?

@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 041cc68 to 40be938 Compare May 27, 2026 02:04
@xrq-phys

Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "The remaining multi-GPU failures are irrelevant to visual_gen"

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50422 [ skip ] triggered by Bot. Commit: 40be938 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50422 [ skip ] completed with state SUCCESS. Commit: 40be938
Skipping testing for commit 40be938

Link to invocation

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 40be938 to aad1edc Compare May 28, 2026 03:34
@xrq-phys

Copy link
Copy Markdown
Collaborator Author

Can we also update docs/source/models/visual-generation.md to accommodate this new attention backend?

Also, can we add a new section, either parallel to or as a subsection under quantization section, to describe our quantized attn variants: SAGE and QK16PV8?

Added

@xrq-phys

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50684 [ run ] triggered by Bot. Commit: aad1edc Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50684 [ run ] completed with state SUCCESS. Commit: aad1edc
/LLM/main/L0_MergeRequest_PR pipeline #40172 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50779 [ run ] triggered by Bot. Commit: aad1edc Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50779 [ run ] completed with state SUCCESS. Commit: aad1edc
/LLM/main/L0_MergeRequest_PR pipeline #40256 completed with status: 'SUCCESS'

CI Report

Link to invocation

@xrq-phys

Copy link
Copy Markdown
Collaborator Author

@zhenhuaw-me Pipeline passed :)

Let's merge this one.

Comment thread docs/source/models/visual-generation.md Outdated
Comment thread docs/source/models/visual-generation.md Outdated
Comment thread examples/visual_gen/README.md
xrq-phys and others added 2 commits May 29, 2026 11:21
Co-authored-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
Signed-off-by: RuQing Xu <7891482+xrq-phys@users.noreply.github.com>
Will prioritize API side usage.

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
@xrq-phys

Copy link
Copy Markdown
Collaborator Author

/bot reuse --comment "Dropped examples not covered by testing"

@github-actions

Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@xrq-phys

Copy link
Copy Markdown
Collaborator Author

/bot reuse-pipeline --comment "Dropped examples not covered by testing"

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50938 [ reuse-pipeline ] triggered by Bot. Commit: 501f8a9 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50938 [ reuse-pipeline ] completed with state SUCCESS. Commit: 501f8a9
Reusing PR_Github #50779 for commit 501f8a9

Link to invocation

@zhenhuaw-me zhenhuaw-me merged commit 1bb1a02 into NVIDIA:main May 29, 2026
7 checks passed
@github-actions

Copy link
Copy Markdown

LFS objects already in storage (64 files) — no sync needed.

These LFS-tracked files are already present in this repository's LFS storage:

  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so
  • tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so
  • ...and 59 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants