[TRTLLM-12436][feat] visual_gen: add CuTe DSL attention via exported binaries#13721
Conversation
|
/bot run |
|
PR_Github #46628 [ run ] triggered by Bot. Commit: |
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (64)
📒 Files selected for processing (21)
📝 WalkthroughWalkthroughThis PR introduces a new CuTe DSL FMHA attention backend for visual generation models with quantization mode support. It adds configuration schema supporting backend-specific quantization recipes, implements CuTeDSLAttention with variable-length and optional QK16PV8 quantization, integrates kernel loading infrastructure, updates CLI examples and tests for comprehensive coverage. ChangesCuTe DSL Backend and Configuration Schema
User-Facing API and Examples
Test Suite Updates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py`:
- Around line 123-127: In the QK16PV8 branch (check
self.context_quantization_mode == "QK16PV8"), guard against a zero max by
clamping the divisor when computing v_qscale: compute max_abs =
max(v.abs().amax(), tiny_eps) (or use torch.clamp_min) before doing v_qscale =
448.0 / max_abs, then multiply v by v_qscale and adjust scale_v as before;
update references to v_qscale, v, and scale_v accordingly so v * v_qscale never
multiplies by infinity when v is all zeros.
- Around line 35-43: Split the combined import try-except so _cute_precompiled
and _flash_attn_fwd are imported in two independent try/except blocks, storing
each exception separately (e.g. _cute_precompiled_import_error and
_flash_attn_fwd_import_error) and leaving the other symbol intact if one import
fails; remove the early guard that raises when _flash_attn_fwd is missing and
instead let _fwd_precompiled() attempt the precompiled path first, then after
_fwd_precompiled() returns None check both _flash_attn_fwd and _cute_precompiled
and raise only if both are None (include the respective import errors in the
raised message for diagnostics).
In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 64-74: The _validate_context_quantization_for_backend_support
model validator in AttentionConfig currently silently downgrades
context_quantization_mode when an unsupported backend is used; instead, detect
when context_quantization_mode == "QK16PV8" and backend != "FA4" and raise a
ValueError with a clear message that includes the requested
context_quantization_mode and backend; keep the `@model_validator`(mode="after")
on _validate_context_quantization_for_backend_support and return self only on
success (do not mutate context_quantization_mode).
In `@tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/fmha.py`:
- Around line 224-241: _check_inputs currently misses validating batch-size
agreement for 4D tensors and does not verify all tensors are CUDA/on the same
device; update _check_inputs to (1) require that when q.dim() == 4, k.dim() and
v.dim() are also 4 and that q.shape[0] == k.shape[0] == v.shape[0] (so
non-varlen path uses consistent batch_size), and (2) assert that q, k, v, o are
CUDA tensors on the same device (compare tensor.device and tensor.device.type ==
"cuda" for each), raising ValueError if any check fails; preserve the existing
shape/dtype checks and the final return q.dim() == 3.
- Around line 269-273: When varlen is true, extend the simple numel() checks on
qo_indptr and kv_indptr to validate their contents and devices: ensure qo_indptr
and kv_indptr are on the same device as their corresponding Q/KV tensors, have
at least two elements, are non-decreasing and start with 0, and that
qo_indptr[-1] == total_q and kv_indptr[-1] == total_kv (use the actual total
token counts for Q/KV from the buffers) so offsets cannot walk past the buffers;
raise clear ValueError messages mentioning qo_indptr/kv_indptr when any of these
invariants fail.
- Around line 407-427: The launch uses torch.cuda.current_stream() without
specifying the device, which can pick the wrong GPU in multi-device setups;
change the stream acquisition to pass the device owning the tensors (e.g., use
torch.cuda.current_stream(device=q_device)) by deriving the device from the
input cute tensors (referencing q_cute/k_cute/v_cute/o_cute) and pass that
stream into kernel_fn (the call to kernel_fn should keep all args the same but
replace torch.cuda.current_stream().cuda_stream with
torch.cuda.current_stream(device=<device_from_q_cute>).cuda_stream so the kernel
launches on the device that owns q/k/v/o).
In `@tests/unittest/_torch/visual_gen/test_attention_cute_precompiled.py`:
- Around line 28-37: The test only loads a varlen=True kernel but runtime calls
(FlashAttn4Attention._fwd_precompiled) request varlen=False; add a non-varlen
precompiled-kernel case by invoking get_cute_dsl_fmha_kernel with varlen=False
(and the same dtype/arch parameters used for varlen) and include a smoke test
that exercises the non-varlen forward API path so the test suite will fail if
the non-varlen binary is missing or mispackaged; ensure both the varlen=True and
varlen=False kernels are loaded in this test (and mirror the same addition for
the other block around lines 205-217).
In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 216-224: The FA4/QK16PV8 test cases can pass without exercising
the precompiled path because the code falls back from _fwd_precompiled() to
_fwd(); modify the test to assert that the precompiled dispatch was actually
used by instrumenting or patching the precompiled loader/dispatch entry (e.g.,
monkeypatch the module/function that returns the precompiled implementation or
wrap _fwd_precompiled()) so it records or raises if called, and then require
that calls for FA4/QK16PV8 hit that instrumented path (fail or skip the test if
the instrumented signal/counter is not observed); target functions to
patch/assert are the attention precompiled loader/dispatcher and the methods
_fwd_precompiled() and _fwd() used by the FA4 backend so the test fails unless
the precompiled branch executed.
In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Around line 666-675: The parametrized test_self_attention_perf currently
passes QK16PV8 as context_quantization_mode but still runs timings when the
runtime falls back to plain FA4; update the test to detect which attention
kernel actually executed (e.g., by reading the kernel selection flag/counter or
instrumentation provided by the attention runtime) and explicitly skip or fail
the test if the selected backend is not the precompiled QK16PV8 path; modify the
test_self_attention_perf (and the analogous perf tests around the other
attention perf cases) to assert the kernel selection before measuring so
benchmarks only run when the intended QK16PV8 kernel is confirmed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 379b180a-d8bb-4b93-ab58-b1c8bef88a76
⛔ Files ignored due to path filters (192)
tensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/aarch64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_100a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_103a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_e4m3_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_causal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_lse_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_skipsm_tvmffi.sois excluded by!**/*.sotensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.ois excluded by!**/*.otensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/cubins/x86_64/sm_110a/cute_dsl_fmha_bf16_h128_nocausal_nonpersistent_varlen_tvmffi.sois excluded by!**/*.so
📒 Files selected for processing (15)
.gitattributes.gitignoreexamples/visual_gen/README.mdexamples/visual_gen/visual_gen_flux.pyexamples/visual_gen/visual_gen_wan_i2v.pyexamples/visual_gen/visual_gen_wan_t2v.pysetup.pytensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.pytensorrt_llm/_torch/visual_gen/attention_backend/utils.pytensorrt_llm/_torch/visual_gen/config.pytensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/__init__.pytensorrt_llm/_torch/visual_gen/jit_kernels/cute_precompiled/fmha.pytests/unittest/_torch/visual_gen/test_attention_cute_precompiled.pytests/unittest/_torch/visual_gen/test_attention_integration.pytests/unittest/_torch/visual_gen/test_attention_perf.py
|
PR_Github #46628 [ run ] completed with state
|
|
/bot run |
|
PR_Github #46639 [ run ] triggered by Bot. Commit: |
|
PR_Github #46639 [ run ] completed with state
|
8ba9953 to
46f2588
Compare
|
/bot run |
|
PR_Github #46785 [ run ] triggered by Bot. Commit: |
|
PR_Github #46785 [ run ] completed with state
|
|
/bot run |
|
PR_Github #46825 [ run ] triggered by Bot. Commit: |
|
PR_Github #46825 [ run ] completed with state
|
|
/bot run |
|
PR_Github #47015 [ run ] triggered by Bot. Commit: |
|
PR_Github #47015 [ run ] completed with state
|
|
/bot run |
|
PR_Github #47082 [ run ] triggered by Bot. Commit: |
|
PR_Github #47082 [ run ] completed with state |
46f2588 to
79572ac
Compare
b24d303 to
041cc68
Compare
|
/bot skip --comment "Remaining multi-GPU failures are irrelevant to visual_gen" |
All single-GPU & visual_gen-related tests should have passed there. |
|
PR_Github #50319 [ skip ] triggered by Bot. Commit: |
|
PR_Github #50319 [ skip ] completed with state |
chang-l
left a comment
There was a problem hiding this comment.
Can we also update docs/source/models/visual-generation.md to accommodate this new attention backend?
Also, can we add a new section, either parallel to or as a subsection under quantization section, to describe our quantized attn variants: SAGE and QK16PV8?
041cc68 to
40be938
Compare
|
/bot skip --comment "The remaining multi-GPU failures are irrelevant to visual_gen" |
|
PR_Github #50422 [ skip ] triggered by Bot. Commit: |
|
PR_Github #50422 [ skip ] completed with state |
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
40be938 to
aad1edc
Compare
Added |
|
/bot run --disable-fail-fast |
|
PR_Github #50684 [ run ] triggered by Bot. Commit: |
|
PR_Github #50684 [ run ] completed with state
|
|
/bot run |
|
PR_Github #50779 [ run ] triggered by Bot. Commit: |
|
PR_Github #50779 [ run ] completed with state |
|
@zhenhuaw-me Pipeline passed :) Let's merge this one. |
Co-authored-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com> Signed-off-by: RuQing Xu <7891482+xrq-phys@users.noreply.github.com>
Will prioritize API side usage. Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
|
/bot reuse --comment "Dropped examples not covered by testing" |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot reuse-pipeline --comment "Dropped examples not covered by testing" |
|
PR_Github #50938 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #50938 [ reuse-pipeline ] completed with state |
|
✅ LFS objects already in storage (64 files) — no sync needed. These LFS-tracked files are already present in this repository's LFS storage:
|
Summary by CodeRabbit
New Features
--quant_attention_modeoption withNO_QUANT,QK16PV8, andSAGEmodes, replacing the previous--enable_sage_attentionflag.--attention_backendto includeCUTEDSLchoice alongside existing options.Documentation
Tests
Description
Adds CuTe DSL
fmha.pyexported cubins to the visual_gen workflow as backend optionCUTEDSL, supporting:Currently, only
head_dim=128is supported due to downstream model requirements.Integrated into Wan & FLUX.
Test Coverage
test_attention_cute_dsl.pyfor smoke tests CuTe DSLtest_attention_integration.pyandtest_attention_perf.pyPR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.