[None] [refactor] Unify compressed-tensors quant config parsing#14468
Conversation
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
/bot run |
|
PR_Github #49966 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThis PR refactors compressed-tensors quantization config parsing logic from two locations into a reusable utility function ChangesCompressed-tensors quantization config refactoring
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tensorrt_llm/models/quant_config_utils.py (1)
30-33: ⚡ Quick winConsider defensive access for nested config keys.
If the HF config is malformed (missing
group_0,weights, orinput_activations), these direct accesses will raise aKeyErrorwith a less helpful message than a validatedValueError. This is especially relevant for a public utility function that may receive varied external configs.🛡️ Proposed fix to add defensive validation
config_groups = hf_quant_config.get("config_groups") if config_groups is None: raise ValueError(f"config_groups is not set in {hf_quant_config}.") - weights_quant_config = config_groups["group_0"]["weights"] - inputs_quant_config = config_groups["group_0"]["input_activations"] + group_0 = config_groups.get("group_0") + if group_0 is None: + raise ValueError("config_groups must contain 'group_0'.") + weights_quant_config = group_0.get("weights") + inputs_quant_config = group_0.get("input_activations") + if weights_quant_config is None or inputs_quant_config is None: + raise ValueError("group_0 must contain 'weights' and 'input_activations'.") weights_quant_strategy = weights_quant_config["strategy"] inputs_quant_strategy = inputs_quant_config["strategy"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/models/quant_config_utils.py` around lines 30 - 33, Validate presence of expected nested keys in config_groups before direct indexing: check that "group_0" exists and that within it both "weights" and "input_activations" exist; if any are missing, raise a clear ValueError describing which key is missing rather than allowing a KeyError. Then safely read weights_quant_config = config_groups["group_0"]["weights"] and inputs_quant_config = config_groups["group_0"]["input_activations"], and proceed to extract weights_quant_strategy and inputs_quant_strategy from those dicts. Reference the variables config_groups, "group_0", weights_quant_config, inputs_quant_config, weights_quant_strategy, and inputs_quant_strategy when implementing the validation and error messages.tests/unittest/models/test_quant_config_utils.py (1)
150-186: 💤 Low valueConsider extending parametrization to cover FP8 input strategy validation.
The current test cases cover unsupported weights strategies and NVFP4 input strategy mismatches, but don't exercise the FP8-specific input strategy validation (Lines 37-38 and 41-42 of the utility: channel requires "token", block requires "group").
🧪 Optional additional test cases
`@pytest.mark.parametrize`( "weights,input_activations,error_match", [ ( { "num_bits": 8, "strategy": "tensor", }, { "num_bits": 8, "strategy": "token", }, "Unsupported weights_quant_strategy", ), + ( + { + "num_bits": 8, + "strategy": "channel", + }, + { + "num_bits": 8, + "strategy": "group", # should be "token" + }, + "Unsupported inputs_quant_strategy", + ), + ( + { + "num_bits": 8, + "strategy": "block", + }, + { + "num_bits": 8, + "strategy": "token", # should be "group" + "group_size": 128, + }, + "Unsupported inputs_quant_strategy", + ), ( { "num_bits": 4,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/models/test_quant_config_utils.py` around lines 150 - 186, Add parametrized cases to test_update_quant_config_from_compressed_tensors_rejects_strategies that exercise FP8-specific validation: call update_quant_config_from_compressed_tensors (with QuantConfig() and _compressed_tensors_config) using a weights config representing FP8 (e.g., type "float" with 8 bits / FP8) and input_activations with the wrong strategies for FP8 (one case using "channel" where FP8 requires "token", and one case using "block" where FP8 requires "group"), and assert the appropriate ValueError match messages for FP8 input strategy mismatches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tensorrt_llm/models/quant_config_utils.py`:
- Around line 30-33: Validate presence of expected nested keys in config_groups
before direct indexing: check that "group_0" exists and that within it both
"weights" and "input_activations" exist; if any are missing, raise a clear
ValueError describing which key is missing rather than allowing a KeyError. Then
safely read weights_quant_config = config_groups["group_0"]["weights"] and
inputs_quant_config = config_groups["group_0"]["input_activations"], and proceed
to extract weights_quant_strategy and inputs_quant_strategy from those dicts.
Reference the variables config_groups, "group_0", weights_quant_config,
inputs_quant_config, weights_quant_strategy, and inputs_quant_strategy when
implementing the validation and error messages.
In `@tests/unittest/models/test_quant_config_utils.py`:
- Around line 150-186: Add parametrized cases to
test_update_quant_config_from_compressed_tensors_rejects_strategies that
exercise FP8-specific validation: call
update_quant_config_from_compressed_tensors (with QuantConfig() and
_compressed_tensors_config) using a weights config representing FP8 (e.g., type
"float" with 8 bits / FP8) and input_activations with the wrong strategies for
FP8 (one case using "channel" where FP8 requires "token", and one case using
"block" where FP8 requires "group"), and assert the appropriate ValueError match
messages for FP8 input strategy mismatches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1b879797-585d-4d04-a195-3f920962c15c
📒 Files selected for processing (6)
tensorrt_llm/_torch/model_config.pytensorrt_llm/llmapi/llm_utils.pytensorrt_llm/models/quant_config_utils.pytests/integration/test_lists/test-db/l0_a10.ymltests/unittest/llmapi/test_kv_cache_dtype_override.pytests/unittest/models/test_quant_config_utils.py
|
PR_Github #49966 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #49990 [ run ] triggered by Bot. Commit: |
|
PR_Github #49990 [ run ] completed with state
|
|
/bot run |
|
PR_Github #50038 [ run ] triggered by Bot. Commit: |
|
PR_Github #50038 [ run ] completed with state |
…IA#14468) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Description
During review of #13559 it was noted that the compressed-tensors quant config parsing was duplicated. This PR deduplicates the code and adds relevant testing.
Test Coverage
tests/unittest/models/test_quant_config_utils.pytests/unittest/llmapi/test_kv_cache_dtype_override.pyPR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.