[None] [refactor] Unify compressed-tensors quant config parsing by DomBrown · Pull Request #14468 · NVIDIA/TensorRT-LLM

DomBrown · 2026-05-22T16:52:13Z

Summary by CodeRabbit

New Features
- Enhanced support for handling quantization configurations from Hugging Face checkpoints using the compressed-tensors format.
Bug Fixes
- Improved validation and error handling for NVFP4 and FP8 quantization schemes with group sizing constraints.
Tests
- Added comprehensive unit tests for quantization configuration parsing and validation.

Description

During review of #13559 it was noted that the compressed-tensors quant config parsing was duplicated. This PR deduplicates the code and adds relevant testing.

Test Coverage

tests/unittest/models/test_quant_config_utils.py
tests/unittest/llmapi/test_kv_cache_dtype_override.py

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

DomBrown · 2026-05-22T16:58:05Z

/bot run

tensorrt-cicd · 2026-05-22T17:03:27Z

PR_Github #49966 [ run ] triggered by Bot. Commit: 5455a78 Link to invocation

coderabbitai · 2026-05-22T17:04:25Z

📝 Walkthrough

Walkthrough

This PR refactors compressed-tensors quantization config parsing logic from two locations into a reusable utility function update_quant_config_from_compressed_tensors, integrates it into callers, and provides comprehensive test coverage for success and failure cases.

Changes

Compressed-tensors quantization config refactoring

Layer / File(s)	Summary
New compressed-tensors config utility implementation `tensorrt_llm/models/quant_config_utils.py`	New utility function parses `config_groups` weight and input-activation strategies, maps them to `QuantAlgo` variants (NVFP4, FP8 block scales, FP8 per-channel-per-token), validates group sizes, handles optional FP8 KV cache scheme, and updates module exclusion lists from `modules_to_not_convert` and `ignore` fields.
Integration with torch model_config `tensorrt_llm/_torch/model_config.py`	Import utility and replace inline compressed-tensors config parsing in the quantization method branch with a single utility call.
Integration with llmapi utilities `tensorrt_llm/llmapi/llm_utils.py`	Import utility and replace inline compressed-tensors strategy parsing, KV cache processing, and module exclusion merging with a utility call in HF checkpoint handling.
Comprehensive unit test suite for utility function `tests/unittest/models/test_quant_config_utils.py`	Tests validate NVFP4 and FP8 parsing, KV cache scheme handling, module exclusion behavior, and failure cases including unsupported strategies, invalid group sizes, missing config, and KV cache conflicts.
Integration tests for KV cache scenarios `tests/unittest/llmapi/test_kv_cache_dtype_override.py`	Tests verify `ModelLoader._update_from_hf_quant_config()` parses compressed-tensors KV cache schemes correctly and rejects conflicting configurations.
Test infrastructure updates `tests/integration/test_lists/test-db/l0_a10.yml`	Add new unit test module to A10 pre-merge test list.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#13559: Earlier enhancements to load_hf_quant_config for compressed-tensors logic that the current PR's refactored utility now consolidates and reuses.

Suggested reviewers

mikeiovine
syuoni
nekorobov
fredricz-20070104
arysef
Funatiq

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.53% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: refactoring to unify/deduplicate compressed-tensors quant config parsing across multiple files.
Description check	✅ Passed	The description clearly explains the issue (duplicated code identified in `#13559`) and solution (deduplication), lists relevant tests, and includes a completed checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

tensorrt_llm/models/quant_config_utils.py (1)

30-33: ⚡ Quick win

Consider defensive access for nested config keys.

If the HF config is malformed (missing group_0, weights, or input_activations), these direct accesses will raise a KeyError with a less helpful message than a validated ValueError. This is especially relevant for a public utility function that may receive varied external configs.

🛡️ Proposed fix to add defensive validation

     config_groups = hf_quant_config.get("config_groups")
     if config_groups is None:
         raise ValueError(f"config_groups is not set in {hf_quant_config}.")
 
-    weights_quant_config = config_groups["group_0"]["weights"]
-    inputs_quant_config = config_groups["group_0"]["input_activations"]
+    group_0 = config_groups.get("group_0")
+    if group_0 is None:
+        raise ValueError("config_groups must contain 'group_0'.")
+    weights_quant_config = group_0.get("weights")
+    inputs_quant_config = group_0.get("input_activations")
+    if weights_quant_config is None or inputs_quant_config is None:
+        raise ValueError("group_0 must contain 'weights' and 'input_activations'.")
     weights_quant_strategy = weights_quant_config["strategy"]
     inputs_quant_strategy = inputs_quant_config["strategy"]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/models/quant_config_utils.py` around lines 30 - 33, Validate
presence of expected nested keys in config_groups before direct indexing: check
that "group_0" exists and that within it both "weights" and "input_activations"
exist; if any are missing, raise a clear ValueError describing which key is
missing rather than allowing a KeyError. Then safely read weights_quant_config =
config_groups["group_0"]["weights"] and inputs_quant_config =
config_groups["group_0"]["input_activations"], and proceed to extract
weights_quant_strategy and inputs_quant_strategy from those dicts. Reference the
variables config_groups, "group_0", weights_quant_config, inputs_quant_config,
weights_quant_strategy, and inputs_quant_strategy when implementing the
validation and error messages.

tests/unittest/models/test_quant_config_utils.py (1)

150-186: 💤 Low value

Consider extending parametrization to cover FP8 input strategy validation.

The current test cases cover unsupported weights strategies and NVFP4 input strategy mismatches, but don't exercise the FP8-specific input strategy validation (Lines 37-38 and 41-42 of the utility: channel requires "token", block requires "group").

🧪 Optional additional test cases

 `@pytest.mark.parametrize`(
     "weights,input_activations,error_match",
     [
         (
             {
                 "num_bits": 8,
                 "strategy": "tensor",
             },
             {
                 "num_bits": 8,
                 "strategy": "token",
             },
             "Unsupported weights_quant_strategy",
         ),
+        (
+            {
+                "num_bits": 8,
+                "strategy": "channel",
+            },
+            {
+                "num_bits": 8,
+                "strategy": "group",  # should be "token"
+            },
+            "Unsupported inputs_quant_strategy",
+        ),
+        (
+            {
+                "num_bits": 8,
+                "strategy": "block",
+            },
+            {
+                "num_bits": 8,
+                "strategy": "token",  # should be "group"
+                "group_size": 128,
+            },
+            "Unsupported inputs_quant_strategy",
+        ),
         (
             {
                 "num_bits": 4,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/models/test_quant_config_utils.py` around lines 150 - 186, Add
parametrized cases to
test_update_quant_config_from_compressed_tensors_rejects_strategies that
exercise FP8-specific validation: call
update_quant_config_from_compressed_tensors (with QuantConfig() and
_compressed_tensors_config) using a weights config representing FP8 (e.g., type
"float" with 8 bits / FP8) and input_activations with the wrong strategies for
FP8 (one case using "channel" where FP8 requires "token", and one case using
"block" where FP8 requires "group"), and assert the appropriate ValueError match
messages for FP8 input strategy mismatches.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tensorrt_llm/models/quant_config_utils.py`:
- Around line 30-33: Validate presence of expected nested keys in config_groups
before direct indexing: check that "group_0" exists and that within it both
"weights" and "input_activations" exist; if any are missing, raise a clear
ValueError describing which key is missing rather than allowing a KeyError. Then
safely read weights_quant_config = config_groups["group_0"]["weights"] and
inputs_quant_config = config_groups["group_0"]["input_activations"], and proceed
to extract weights_quant_strategy and inputs_quant_strategy from those dicts.
Reference the variables config_groups, "group_0", weights_quant_config,
inputs_quant_config, weights_quant_strategy, and inputs_quant_strategy when
implementing the validation and error messages.

In `@tests/unittest/models/test_quant_config_utils.py`:
- Around line 150-186: Add parametrized cases to
test_update_quant_config_from_compressed_tensors_rejects_strategies that
exercise FP8-specific validation: call
update_quant_config_from_compressed_tensors (with QuantConfig() and
_compressed_tensors_config) using a weights config representing FP8 (e.g., type
"float" with 8 bits / FP8) and input_activations with the wrong strategies for
FP8 (one case using "channel" where FP8 requires "token", and one case using
"block" where FP8 requires "group"), and assert the appropriate ValueError match
messages for FP8 input strategy mismatches.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1b879797-585d-4d04-a195-3f920962c15c

📥 Commits

Reviewing files that changed from the base of the PR and between 30b87fc and 5455a78.

📒 Files selected for processing (6)

tensorrt_llm/_torch/model_config.py
tensorrt_llm/llmapi/llm_utils.py
tensorrt_llm/models/quant_config_utils.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/llmapi/test_kv_cache_dtype_override.py
tests/unittest/models/test_quant_config_utils.py

tensorrt-cicd · 2026-05-22T20:29:31Z

PR_Github #49966 [ run ] completed with state SUCCESS. Commit: 5455a78
/LLM/main/L0_MergeRequest_PR pipeline #39531 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

DomBrown · 2026-05-22T21:21:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-22T21:27:05Z

PR_Github #49990 [ run ] triggered by Bot. Commit: 5455a78 Link to invocation

tensorrt-cicd · 2026-05-23T02:03:31Z

PR_Github #49990 [ run ] completed with state FAILURE. Commit: 5455a78
/LLM/main/L0_MergeRequest_PR pipeline #39555 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

DomBrown · 2026-05-23T08:16:00Z

/bot run

tensorrt-cicd · 2026-05-23T08:22:22Z

PR_Github #50038 [ run ] triggered by Bot. Commit: 5455a78 Link to invocation

tensorrt-cicd · 2026-05-23T11:42:55Z

PR_Github #50038 [ run ] completed with state SUCCESS. Commit: 5455a78
/LLM/main/L0_MergeRequest_PR pipeline #39601 completed with status: 'SUCCESS'

CI Report

Link to invocation

…IA#14468) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

refactor: unify compressed-tensors quant config parsing

5455a78

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

github-actions Bot assigned DomBrown May 22, 2026

DomBrown changed the title ~~[None] [Refactor] Unify compressed-tensors quant config parsing~~ [None] [refactor] Unify compressed-tensors quant config parsing May 22, 2026

DomBrown marked this pull request as ready for review May 22, 2026 16:59

DomBrown requested review from a team as code owners May 22, 2026 16:59

DomBrown requested review from hyukn, schetlur-nv and suyoggupta May 22, 2026 16:59

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

dcampora approved these changes May 26, 2026

View reviewed changes

hchings approved these changes May 26, 2026

View reviewed changes

juney-nvidia approved these changes May 26, 2026

View reviewed changes

juney-nvidia merged commit c7e7fc5 into NVIDIA:main May 26, 2026
15 of 18 checks passed

bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 28, 2026

[None] [refactor] Unify compressed-tensors quant config parsing (NVID…

095f968

…IA#14468) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

DomBrown deleted the dev/refactor_compressed_tensors branch June 1, 2026 08:28

Conversation

DomBrown commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

DomBrown commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

coderabbitai Bot commented May 22, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

DomBrown commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

DomBrown commented May 23, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DomBrown commented May 22, 2026 •

edited by coderabbitai Bot

Loading