Skip to content

[None] [refactor] Unify compressed-tensors quant config parsing#14468

Merged
juney-nvidia merged 1 commit into
NVIDIA:mainfrom
DomBrown:dev/refactor_compressed_tensors
May 26, 2026
Merged

[None] [refactor] Unify compressed-tensors quant config parsing#14468
juney-nvidia merged 1 commit into
NVIDIA:mainfrom
DomBrown:dev/refactor_compressed_tensors

Conversation

@DomBrown

@DomBrown DomBrown commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Enhanced support for handling quantization configurations from Hugging Face checkpoints using the compressed-tensors format.
  • Bug Fixes

    • Improved validation and error handling for NVFP4 and FP8 quantization schemes with group sizing constraints.
  • Tests

    • Added comprehensive unit tests for quantization configuration parsing and validation.

Review Change Stack

Description

During review of #13559 it was noted that the compressed-tensors quant config parsing was duplicated. This PR deduplicates the code and adds relevant testing.

Test Coverage

tests/unittest/models/test_quant_config_utils.py
tests/unittest/llmapi/test_kv_cache_dtype_override.py

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
@DomBrown DomBrown changed the title [None] [Refactor] Unify compressed-tensors quant config parsing [None] [refactor] Unify compressed-tensors quant config parsing May 22, 2026
@DomBrown

Copy link
Copy Markdown
Collaborator Author

/bot run

@DomBrown DomBrown marked this pull request as ready for review May 22, 2026 16:59
@DomBrown DomBrown requested review from a team as code owners May 22, 2026 16:59
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #49966 [ run ] triggered by Bot. Commit: 5455a78 Link to invocation

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR refactors compressed-tensors quantization config parsing logic from two locations into a reusable utility function update_quant_config_from_compressed_tensors, integrates it into callers, and provides comprehensive test coverage for success and failure cases.

Changes

Compressed-tensors quantization config refactoring

Layer / File(s) Summary
New compressed-tensors config utility implementation
tensorrt_llm/models/quant_config_utils.py
New utility function parses config_groups weight and input-activation strategies, maps them to QuantAlgo variants (NVFP4, FP8 block scales, FP8 per-channel-per-token), validates group sizes, handles optional FP8 KV cache scheme, and updates module exclusion lists from modules_to_not_convert and ignore fields.
Integration with torch model_config
tensorrt_llm/_torch/model_config.py
Import utility and replace inline compressed-tensors config parsing in the quantization method branch with a single utility call.
Integration with llmapi utilities
tensorrt_llm/llmapi/llm_utils.py
Import utility and replace inline compressed-tensors strategy parsing, KV cache processing, and module exclusion merging with a utility call in HF checkpoint handling.
Comprehensive unit test suite for utility function
tests/unittest/models/test_quant_config_utils.py
Tests validate NVFP4 and FP8 parsing, KV cache scheme handling, module exclusion behavior, and failure cases including unsupported strategies, invalid group sizes, missing config, and KV cache conflicts.
Integration tests for KV cache scenarios
tests/unittest/llmapi/test_kv_cache_dtype_override.py
Tests verify ModelLoader._update_from_hf_quant_config() parses compressed-tensors KV cache schemes correctly and rejects conflicting configurations.
Test infrastructure updates
tests/integration/test_lists/test-db/l0_a10.yml
Add new unit test module to A10 pre-merge test list.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#13559: Earlier enhancements to load_hf_quant_config for compressed-tensors logic that the current PR's refactored utility now consolidates and reuses.

Suggested reviewers

  • mikeiovine
  • syuoni
  • nekorobov
  • fredricz-20070104
  • arysef
  • Funatiq
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: refactoring to unify/deduplicate compressed-tensors quant config parsing across multiple files.
Description check ✅ Passed The description clearly explains the issue (duplicated code identified in #13559) and solution (deduplication), lists relevant tests, and includes a completed checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tensorrt_llm/models/quant_config_utils.py (1)

30-33: ⚡ Quick win

Consider defensive access for nested config keys.

If the HF config is malformed (missing group_0, weights, or input_activations), these direct accesses will raise a KeyError with a less helpful message than a validated ValueError. This is especially relevant for a public utility function that may receive varied external configs.

🛡️ Proposed fix to add defensive validation
     config_groups = hf_quant_config.get("config_groups")
     if config_groups is None:
         raise ValueError(f"config_groups is not set in {hf_quant_config}.")
 
-    weights_quant_config = config_groups["group_0"]["weights"]
-    inputs_quant_config = config_groups["group_0"]["input_activations"]
+    group_0 = config_groups.get("group_0")
+    if group_0 is None:
+        raise ValueError("config_groups must contain 'group_0'.")
+    weights_quant_config = group_0.get("weights")
+    inputs_quant_config = group_0.get("input_activations")
+    if weights_quant_config is None or inputs_quant_config is None:
+        raise ValueError("group_0 must contain 'weights' and 'input_activations'.")
     weights_quant_strategy = weights_quant_config["strategy"]
     inputs_quant_strategy = inputs_quant_config["strategy"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/models/quant_config_utils.py` around lines 30 - 33, Validate
presence of expected nested keys in config_groups before direct indexing: check
that "group_0" exists and that within it both "weights" and "input_activations"
exist; if any are missing, raise a clear ValueError describing which key is
missing rather than allowing a KeyError. Then safely read weights_quant_config =
config_groups["group_0"]["weights"] and inputs_quant_config =
config_groups["group_0"]["input_activations"], and proceed to extract
weights_quant_strategy and inputs_quant_strategy from those dicts. Reference the
variables config_groups, "group_0", weights_quant_config, inputs_quant_config,
weights_quant_strategy, and inputs_quant_strategy when implementing the
validation and error messages.
tests/unittest/models/test_quant_config_utils.py (1)

150-186: 💤 Low value

Consider extending parametrization to cover FP8 input strategy validation.

The current test cases cover unsupported weights strategies and NVFP4 input strategy mismatches, but don't exercise the FP8-specific input strategy validation (Lines 37-38 and 41-42 of the utility: channel requires "token", block requires "group").

🧪 Optional additional test cases
 `@pytest.mark.parametrize`(
     "weights,input_activations,error_match",
     [
         (
             {
                 "num_bits": 8,
                 "strategy": "tensor",
             },
             {
                 "num_bits": 8,
                 "strategy": "token",
             },
             "Unsupported weights_quant_strategy",
         ),
+        (
+            {
+                "num_bits": 8,
+                "strategy": "channel",
+            },
+            {
+                "num_bits": 8,
+                "strategy": "group",  # should be "token"
+            },
+            "Unsupported inputs_quant_strategy",
+        ),
+        (
+            {
+                "num_bits": 8,
+                "strategy": "block",
+            },
+            {
+                "num_bits": 8,
+                "strategy": "token",  # should be "group"
+                "group_size": 128,
+            },
+            "Unsupported inputs_quant_strategy",
+        ),
         (
             {
                 "num_bits": 4,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/models/test_quant_config_utils.py` around lines 150 - 186, Add
parametrized cases to
test_update_quant_config_from_compressed_tensors_rejects_strategies that
exercise FP8-specific validation: call
update_quant_config_from_compressed_tensors (with QuantConfig() and
_compressed_tensors_config) using a weights config representing FP8 (e.g., type
"float" with 8 bits / FP8) and input_activations with the wrong strategies for
FP8 (one case using "channel" where FP8 requires "token", and one case using
"block" where FP8 requires "group"), and assert the appropriate ValueError match
messages for FP8 input strategy mismatches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tensorrt_llm/models/quant_config_utils.py`:
- Around line 30-33: Validate presence of expected nested keys in config_groups
before direct indexing: check that "group_0" exists and that within it both
"weights" and "input_activations" exist; if any are missing, raise a clear
ValueError describing which key is missing rather than allowing a KeyError. Then
safely read weights_quant_config = config_groups["group_0"]["weights"] and
inputs_quant_config = config_groups["group_0"]["input_activations"], and proceed
to extract weights_quant_strategy and inputs_quant_strategy from those dicts.
Reference the variables config_groups, "group_0", weights_quant_config,
inputs_quant_config, weights_quant_strategy, and inputs_quant_strategy when
implementing the validation and error messages.

In `@tests/unittest/models/test_quant_config_utils.py`:
- Around line 150-186: Add parametrized cases to
test_update_quant_config_from_compressed_tensors_rejects_strategies that
exercise FP8-specific validation: call
update_quant_config_from_compressed_tensors (with QuantConfig() and
_compressed_tensors_config) using a weights config representing FP8 (e.g., type
"float" with 8 bits / FP8) and input_activations with the wrong strategies for
FP8 (one case using "channel" where FP8 requires "token", and one case using
"block" where FP8 requires "group"), and assert the appropriate ValueError match
messages for FP8 input strategy mismatches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1b879797-585d-4d04-a195-3f920962c15c

📥 Commits

Reviewing files that changed from the base of the PR and between 30b87fc and 5455a78.

📒 Files selected for processing (6)
  • tensorrt_llm/_torch/model_config.py
  • tensorrt_llm/llmapi/llm_utils.py
  • tensorrt_llm/models/quant_config_utils.py
  • tests/integration/test_lists/test-db/l0_a10.yml
  • tests/unittest/llmapi/test_kv_cache_dtype_override.py
  • tests/unittest/models/test_quant_config_utils.py

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #49966 [ run ] completed with state SUCCESS. Commit: 5455a78
/LLM/main/L0_MergeRequest_PR pipeline #39531 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@DomBrown

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #49990 [ run ] triggered by Bot. Commit: 5455a78 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #49990 [ run ] completed with state FAILURE. Commit: 5455a78
/LLM/main/L0_MergeRequest_PR pipeline #39555 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@DomBrown

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50038 [ run ] triggered by Bot. Commit: 5455a78 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #50038 [ run ] completed with state SUCCESS. Commit: 5455a78
/LLM/main/L0_MergeRequest_PR pipeline #39601 completed with status: 'SUCCESS'

CI Report

Link to invocation

@juney-nvidia juney-nvidia merged commit c7e7fc5 into NVIDIA:main May 26, 2026
15 of 18 checks passed
bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 28, 2026
…IA#14468)

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
@DomBrown DomBrown deleted the dev/refactor_compressed_tensors branch June 1, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants