[TRTLLM-12436][feat] visual_gen: add CuTe DSL attention via exported binaries by xrq-phys · Pull Request #13721 · NVIDIA/TensorRT-LLM

xrq-phys · 2026-05-04T08:07:20Z

Summary by CodeRabbit

New Features
- Added CuTe DSL attention backend support for visual generation models on Blackwell GPUs.
- Introduced --quant_attention_mode option with NO_QUANT, QK16PV8, and SAGE modes, replacing the previous --enable_sage_attention flag.
- Extended --attention_backend to include CUTEDSL choice alongside existing options.
Documentation
- Updated visual generation examples with CuTe DSL backend usage and new quantization mode configurations.
Tests
- Added new tests for CuTe DSL attention kernel execution and validation.
- Expanded existing attention tests to cover new backends and quantization configurations.

Description

Adds CuTe DSL fmha.py exported cubins to the visual_gen workflow as backend option CUTEDSL, supporting:

Ragged attention on 16-bit datatypes
Ragged attention with Qk16Pv8 (Q & K in BF16, V in FP8, online P converted to FP8)

Currently, only head_dim=128 is supported due to downstream model requirements.

Integrated into Wan & FLUX.

Test Coverage

Added test_attention_cute_dsl.py for smoke tests CuTe DSL
Added tests for the integration to test_attention_integration.py and test_attention_perf.py

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

xrq-phys · 2026-05-04T08:08:28Z

/bot run

tensorrt-cicd · 2026-05-04T08:15:41Z

PR_Github #46628 [ run ] triggered by Bot. Commit: 8ba9953 Link to invocation

coderabbitai · 2026-05-04T08:17:46Z

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 91336eb6-eb7f-42ab-a9fd-604116991391

📥 Commits

Reviewing files that changed from the base of the PR and between 546a5b0 and 5fd65e1.

⛔ Files ignored due to path filters (64)

tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so

📒 Files selected for processing (21)

.gitattributes
.gitignore
examples/visual_gen/README.md
examples/visual_gen/visual_gen_flux.py
examples/visual_gen/visual_gen_wan_i2v.py
examples/visual_gen/visual_gen_wan_t2v.py
setup.py
tensorrt_llm/_torch/visual_gen/attention_backend/__init__.py
tensorrt_llm/_torch/visual_gen/attention_backend/cute_dsl.py
tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/__init__.py
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/__init__.py
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/__init__.py
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/fmha.py
tensorrt_llm/visual_gen/args.py
tests/integration/test_lists/test-db/l0_b200.yml
tests/unittest/_torch/visual_gen/test_attention_cute_dsl.py
tests/unittest/_torch/visual_gen/test_attention_integration.py
tests/unittest/_torch/visual_gen/test_attention_perf.py
tests/unittest/_torch/visual_gen/test_visual_gen_args.py

📝 Walkthrough

Walkthrough

This PR introduces a new CuTe DSL FMHA attention backend for visual generation models with quantization mode support. It adds configuration schema supporting backend-specific quantization recipes, implements CuTeDSLAttention with variable-length and optional QK16PV8 quantization, integrates kernel loading infrastructure, updates CLI examples and tests for comprehensive coverage.

Changes

CuTe DSL Backend and Configuration Schema

Layer / File(s)	Summary
Configuration schema and backend-specific validation `tensorrt_llm/visual_gen/args.py`	`QuantAttentionConfig` expands to support bf16 dtype and 0-based block sizes; `AttentionConfig` adds CUTEDSL backend; validator refactored to enforce `SAGE_RECIPES` for TRTLLM and `QK16PV8_DTYPES` for CUTEDSL.
CuTeDSLAttention implementation `tensorrt_llm/_torch/visual_gen/attention_backend/cute_dsl.py`	New attention backend enforcing `head_dim==128`, with input preparation (dtype casting), optional QK16PV8 V-quantization, variable-length indirection tensors, and LSE output support.
FMHA kernel discovery and execution `tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/fmha.py`	Cubin discovery for Blackwell GPUs, dtype-to-variant mapping with float8, cubin resolution and caching, main `cute_dsl_fmha_fwd` dispatcher supporting both TVM-FFI and CuTe tensor paths.
Backend factory integration `tensorrt_llm/_torch/visual_gen/attention_backend/utils.py`, `tensorrt_llm/_torch/visual_gen/attention_backend/__init__.py`	Lazy imports and factory wiring for CuTeDSLAttention; docstring updates documenting CUTEDSL support and quantization config forwarding.
Package configuration `.gitattributes`, `.gitignore`, `setup.py`, package `__init__.py` files	Git LFS tracking for cubins, negation rule for cubins directory, cubin artifacts added to wheel package_data, copyright year updated to 2026.

User-Facing API and Examples

Layer / File(s)	Summary
CLI quantization mode `examples/visual_gen/visual_gen_flux.py`, `examples/visual_gen/visual_gen_wan_i2v.py`, `examples/visual_gen/visual_gen_wan_t2v.py`	Added `--quant_attention_mode` argument (NO_QUANT/QK16PV8/SAGE), extended `--attention_backend` to include CUTEDSL, removed `--enable_sage_attention` flag, integrated mode selection into VisualGenArgs.
Documentation updates `examples/visual_gen/README.md`, `tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py`	Updated examples with CUTEDSL/QK16PV8 usage; expanded argument reference tables; documented CUTEDSL as LSE-capable backend for Attention2D parallelism.

Test Suite Updates

Layer / File(s)	Summary
Integration test parametrization `tests/unittest/_torch/visual_gen/test_attention_integration.py`	Extended `create_model_config` with `quant_attention_config`; added `_require_attention_backend` helper with GPU SM gating; parametrized self/cross-attention and WAN-shape tests over (head_dim, backend, quant config) combinations.
CuTe DSL kernel smoke tests `tests/unittest/_torch/visual_gen/test_attention_cute_dsl.py`	New test module with CUDA/GPU-arch gating (sm_100a/sm_103a); helpers for variable-length indirection tensors and float8-aware tensor generation; reference SDPA with GQA head replication; kernel loading and forward execution smoke tests.
Performance benchmark updates `tests/unittest/_torch/visual_gen/test_attention_perf.py`	Added CuTe DSL availability detection; extended `WanAttentionPerformanceBenchmark` methods to propagate `quant_attention_config`; parametrized self-attention tests over (head_dim, backend, quant config).
Configuration validation tests `tests/unittest/_torch/visual_gen/test_visual_gen_args.py`	Updated backend rejection test to expect "requires backend in …" error; added test cases for TRTLLM/SAGE/CUTEDSL quantization config validation.
Test registry `tests/integration/test_lists/test-db/l0_b200.yml`	Added `test_attention_cute_dsl.py` to Visual Gen test list; reordered existing attention tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14175: Updates VisualGen args.py configuration and validation infrastructure that this PR extends with CUTEDSL quantization recipe support.

Suggested reviewers

arysef
nv-guomingz
yuxianq
bobboli
laikhtewari

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The PR title accurately summarizes the main change: adding CuTe DSL attention support via precompiled CUTLASS binaries for visual_gen workflows, which is the primary focus across all modified files.
Description check	✅ Passed	The PR description covers the essential aspects: what is being added (CuTe DSL fmha support with Qk16Pv8 variant), how it works (ragged attention, binaries via FA4), and test coverage details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py`:
- Around line 123-127: In the QK16PV8 branch (check
self.context_quantization_mode == "QK16PV8"), guard against a zero max by
clamping the divisor when computing v_qscale: compute max_abs =
max(v.abs().amax(), tiny_eps) (or use torch.clamp_min) before doing v_qscale =
448.0 / max_abs, then multiply v by v_qscale and adjust scale_v as before;
update references to v_qscale, v, and scale_v accordingly so v * v_qscale never
multiplies by infinity when v is all zeros.
- Around line 35-43: Split the combined import try-except so _cute_precompiled
and _flash_attn_fwd are imported in two independent try/except blocks, storing
each exception separately (e.g. _cute_precompiled_import_error and
_flash_attn_fwd_import_error) and leaving the other symbol intact if one import
fails; remove the early guard that raises when _flash_attn_fwd is missing and
instead let _fwd_precompiled() attempt the precompiled path first, then after
_fwd_precompiled() returns None check both _flash_attn_fwd and _cute_precompiled
and raise only if both are None (include the respective import errors in the
raised message for diagnostics).

In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 64-74: The _validate_context_quantization_for_backend_support
model validator in AttentionConfig currently silently downgrades
context_quantization_mode when an unsupported backend is used; instead, detect
when context_quantization_mode == "QK16PV8" and backend != "FA4" and raise a
ValueError with a clear message that includes the requested
context_quantization_mode and backend; keep the `@model_validator`(mode="after")
on _validate_context_quantization_for_backend_support and return self only on
success (do not mutate context_quantization_mode).

In `@tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/fmha.py`:
- Around line 224-241: _check_inputs currently misses validating batch-size
agreement for 4D tensors and does not verify all tensors are CUDA/on the same
device; update _check_inputs to (1) require that when q.dim() == 4, k.dim() and
v.dim() are also 4 and that q.shape[0] == k.shape[0] == v.shape[0] (so
non-varlen path uses consistent batch_size), and (2) assert that q, k, v, o are
CUDA tensors on the same device (compare tensor.device and tensor.device.type ==
"cuda" for each), raising ValueError if any check fails; preserve the existing
shape/dtype checks and the final return q.dim() == 3.
- Around line 269-273: When varlen is true, extend the simple numel() checks on
qo_indptr and kv_indptr to validate their contents and devices: ensure qo_indptr
and kv_indptr are on the same device as their corresponding Q/KV tensors, have
at least two elements, are non-decreasing and start with 0, and that
qo_indptr[-1] == total_q and kv_indptr[-1] == total_kv (use the actual total
token counts for Q/KV from the buffers) so offsets cannot walk past the buffers;
raise clear ValueError messages mentioning qo_indptr/kv_indptr when any of these
invariants fail.
- Around line 407-427: The launch uses torch.cuda.current_stream() without
specifying the device, which can pick the wrong GPU in multi-device setups;
change the stream acquisition to pass the device owning the tensors (e.g., use
torch.cuda.current_stream(device=q_device)) by deriving the device from the
input cute tensors (referencing q_cute/k_cute/v_cute/o_cute) and pass that
stream into kernel_fn (the call to kernel_fn should keep all args the same but
replace torch.cuda.current_stream().cuda_stream with
torch.cuda.current_stream(device=<device_from_q_cute>).cuda_stream so the kernel
launches on the device that owns q/k/v/o).

In `@tests/unittest/_torch/visual_gen/test_attention_cute_precompiled.py`:
- Around line 28-37: The test only loads a varlen=True kernel but runtime calls
(FlashAttn4Attention._fwd_precompiled) request varlen=False; add a non-varlen
precompiled-kernel case by invoking get_cute_dsl_fmha_kernel with varlen=False
(and the same dtype/arch parameters used for varlen) and include a smoke test
that exercises the non-varlen forward API path so the test suite will fail if
the non-varlen binary is missing or mispackaged; ensure both the varlen=True and
varlen=False kernels are loaded in this test (and mirror the same addition for
the other block around lines 205-217).

In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 216-224: The FA4/QK16PV8 test cases can pass without exercising
the precompiled path because the code falls back from _fwd_precompiled() to
_fwd(); modify the test to assert that the precompiled dispatch was actually
used by instrumenting or patching the precompiled loader/dispatch entry (e.g.,
monkeypatch the module/function that returns the precompiled implementation or
wrap _fwd_precompiled()) so it records or raises if called, and then require
that calls for FA4/QK16PV8 hit that instrumented path (fail or skip the test if
the instrumented signal/counter is not observed); target functions to
patch/assert are the attention precompiled loader/dispatcher and the methods
_fwd_precompiled() and _fwd() used by the FA4 backend so the test fails unless
the precompiled branch executed.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Around line 666-675: The parametrized test_self_attention_perf currently
passes QK16PV8 as context_quantization_mode but still runs timings when the
runtime falls back to plain FA4; update the test to detect which attention
kernel actually executed (e.g., by reading the kernel selection flag/counter or
instrumentation provided by the attention runtime) and explicitly skip or fail
the test if the selected backend is not the precompiled QK16PV8 path; modify the
test_self_attention_perf (and the analogous perf tests around the other
attention perf cases) to assert the kernel selection before measuring so
benchmarks only run when the intended QK16PV8 kernel is confirmed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 379b180a-d8bb-4b93-ab58-b1c8bef88a76

📥 Commits

Reviewing files that changed from the base of the PR and between f504047 and 8ba9953.

⛔ Files ignored due to path filters (192)

tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.so is excluded by !**/*.so
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.o is excluded by !**/*.o
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.so is excluded by !**/*.so

📒 Files selected for processing (15)

.gitattributes
.gitignore
examples/visual_gen/README.md
examples/visual_gen/visual_gen_flux.py
examples/visual_gen/visual_gen_wan_i2v.py
examples/visual_gen/visual_gen_wan_t2v.py
setup.py
tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
tensorrt_llm/_torch/visual_gen/config.py
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/__init__.py
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/fmha.py
tests/unittest/_torch/visual_gen/test_attention_cute_precompiled.py
tests/unittest/_torch/visual_gen/test_attention_integration.py
tests/unittest/_torch/visual_gen/test_attention_perf.py

tensorrt-cicd · 2026-05-04T11:29:30Z

PR_Github #46628 [ run ] completed with state SUCCESS. Commit: 8ba9953
/LLM/main/L0_MergeRequest_PR pipeline #36673 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-05-04T12:16:48Z

/bot run

tensorrt-cicd · 2026-05-04T12:24:01Z

PR_Github #46639 [ run ] triggered by Bot. Commit: 8ba9953 Link to invocation

tensorrt-cicd · 2026-05-04T15:08:08Z

PR_Github #46639 [ run ] completed with state SUCCESS. Commit: 8ba9953
/LLM/main/L0_MergeRequest_PR pipeline #36683 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-05-05T09:11:35Z

/bot run

tensorrt-cicd · 2026-05-05T09:17:47Z

PR_Github #46785 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

tensorrt-cicd · 2026-05-05T11:44:12Z

PR_Github #46785 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #36807 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-05-05T15:15:44Z

/bot run

tensorrt-cicd · 2026-05-05T15:23:24Z

PR_Github #46825 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

tensorrt-cicd · 2026-05-06T13:53:39Z

PR_Github #46825 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #36845 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-05-06T14:03:51Z

/bot run

tensorrt-cicd · 2026-05-06T14:10:34Z

PR_Github #47015 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

tensorrt-cicd · 2026-05-06T18:59:34Z

PR_Github #47015 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #36990 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-05-07T02:35:55Z

/bot run

tensorrt-cicd · 2026-05-07T02:41:31Z

PR_Github #47082 [ run ] triggered by Bot. Commit: 46f2588 Link to invocation

tensorrt-cicd · 2026-05-07T03:59:35Z

PR_Github #47082 [ run ] completed with state SUCCESS. Commit: 46f2588
/LLM/main/L0_MergeRequest_PR pipeline #37053 completed with status: 'SUCCESS'

CI Report

Link to invocation

xrq-phys · 2026-05-26T07:59:51Z

/bot skip --comment "Remaining multi-GPU failures are irrelevant to visual_gen"

xrq-phys · 2026-05-26T08:00:20Z

PR_Github #50307 [ run ] completed with state SUCCESS. Commit: 07dbb30 /LLM/main/L0_MergeRequest_PR pipeline #39837 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR

If you cannot view the failures, ask the CI triggerer to share details

Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

All single-GPU & visual_gen-related tests should have passed there.

tensorrt-cicd · 2026-05-26T08:06:02Z

PR_Github #50319 [ skip ] triggered by Bot. Commit: 041cc68 Link to invocation

tensorrt-cicd · 2026-05-26T08:12:41Z

PR_Github #50319 [ skip ] completed with state SUCCESS. Commit: 041cc68
Skipping testing for commit 041cc68

Link to invocation

chang-l

Can we also update docs/source/models/visual-generation.md to accommodate this new attention backend?

Also, can we add a new section, either parallel to or as a subsection under quantization section, to describe our quantized attn variants: SAGE and QK16PV8?

xrq-phys · 2026-05-27T02:04:54Z

/bot skip --comment "The remaining multi-GPU failures are irrelevant to visual_gen"

tensorrt-cicd · 2026-05-27T02:11:05Z

PR_Github #50422 [ skip ] triggered by Bot. Commit: 40be938 Link to invocation

tensorrt-cicd · 2026-05-27T02:17:26Z

PR_Github #50422 [ skip ] completed with state SUCCESS. Commit: 40be938
Skipping testing for commit 40be938

Link to invocation

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

xrq-phys · 2026-05-28T03:35:37Z

Can we also update docs/source/models/visual-generation.md to accommodate this new attention backend?

Also, can we add a new section, either parallel to or as a subsection under quantization section, to describe our quantized attn variants: SAGE and QK16PV8?

Added

xrq-phys · 2026-05-28T03:36:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-28T03:42:31Z

PR_Github #50684 [ run ] triggered by Bot. Commit: aad1edc Link to invocation

tensorrt-cicd · 2026-05-28T11:17:13Z

PR_Github #50684 [ run ] completed with state SUCCESS. Commit: aad1edc
/LLM/main/L0_MergeRequest_PR pipeline #40172 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-05-28T13:07:34Z

/bot run

tensorrt-cicd · 2026-05-28T13:13:32Z

PR_Github #50779 [ run ] triggered by Bot. Commit: aad1edc Link to invocation

tensorrt-cicd · 2026-05-28T13:55:11Z

PR_Github #50779 [ run ] completed with state SUCCESS. Commit: aad1edc
/LLM/main/L0_MergeRequest_PR pipeline #40256 completed with status: 'SUCCESS'

CI Report

Link to invocation

xrq-phys · 2026-05-28T14:05:28Z

@zhenhuaw-me Pipeline passed :)

Let's merge this one.

Co-authored-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com> Signed-off-by: RuQing Xu <7891482+xrq-phys@users.noreply.github.com>

Will prioritize API side usage. Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

xrq-phys · 2026-05-29T02:29:09Z

/bot reuse --comment "Dropped examples not covered by testing"

github-actions · 2026-05-29T02:29:17Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

xrq-phys · 2026-05-29T02:29:33Z

/bot reuse-pipeline --comment "Dropped examples not covered by testing"

tensorrt-cicd · 2026-05-29T02:36:21Z

PR_Github #50938 [ reuse-pipeline ] triggered by Bot. Commit: 501f8a9 Link to invocation

tensorrt-cicd · 2026-05-29T02:42:56Z

PR_Github #50938 [ reuse-pipeline ] completed with state SUCCESS. Commit: 501f8a9
Reusing PR_Github #50779 for commit 501f8a9

Link to invocation

github-actions · 2026-05-29T02:58:31Z

✅ LFS objects already in storage (64 files) — no sync needed.

These LFS-tracked files are already present in this repository's LFS storage:

tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.so
tensorrt_llm/_torch/visual_gen/cute_dsl_kernels/blackwell/attention/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.so
...and 59 more

xrq-phys requested review from a team as code owners May 4, 2026 08:07

xrq-phys requested review from arysef and nv-guomingz May 4, 2026 08:07

github-actions Bot assigned xrq-phys May 4, 2026

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

xrq-phys changed the title ~~[None][feat] visual_gen: add CuTe DSL attention via exported binaries~~ [TRTLLM-12436][feat] visual_gen: add CuTe DSL attention via exported binaries May 4, 2026

chang-l added the VisualGen label May 4, 2026

xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 8ba9953 to 46f2588 Compare May 5, 2026 09:09

xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 46f2588 to 79572ac Compare May 15, 2026 12:47

xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from b24d303 to 041cc68 Compare May 26, 2026 07:59

chang-l reviewed May 26, 2026

View reviewed changes

xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 041cc68 to 40be938 Compare May 27, 2026 02:04

visual_gen: add CuTe DSL attention via exported binaries

aad1edc

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

xrq-phys force-pushed the ruqingx/visual_gen/qk16pv8_precompiled branch from 40be938 to aad1edc Compare May 28, 2026 03:34

zhenhuaw-me reviewed May 29, 2026

View reviewed changes

Comment thread docs/source/models/visual-generation.md Outdated

Comment thread docs/source/models/visual-generation.md Outdated

Comment thread examples/visual_gen/README.md

xrq-phys and others added 2 commits May 29, 2026 11:21

Drop example code

6703dc1

Co-authored-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com> Signed-off-by: RuQing Xu <7891482+xrq-phys@users.noreply.github.com>

Drop example-side changes.

501f8a9

Will prioritize API side usage. Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

zhenhuaw-me merged commit 1bb1a02 into NVIDIA:main May 29, 2026
7 checks passed

Conversation

xrq-phys commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

xrq-phys commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

xrq-phys commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

xrq-phys commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

xrq-phys commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

xrq-phys commented May 6, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

xrq-phys commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

xrq-phys commented May 26, 2026

Uh oh!

xrq-phys commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

chang-l left a comment

Choose a reason for hiding this comment

Uh oh!

xrq-phys commented May 27, 2026

Uh oh!

xrq-phys commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading