Add hqq support for Int4TilePackedTo4dTensor #2912

jerryzh168 · 2025-08-29T23:11:50Z

Summary:

Added Int4ChooseQparamsAlgorithm enum that has TINYGEMM and HQQ options, by default tensors will be using TINYGEMM option
Enabled Int4ChooseQparamsAlgorithm.HQQ option for Int4TilePackedTo4dTensor, instead of calling quant primitive ops for tinygemm to quantize the high precision tensor, the HQQ path will quantize with _choose_qparams_and_quantize_affine_hqq that help improve accuracy for int4 weight only quantization, but still reuse the tinygemm kernel for speedup

Test Plan:
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py

Accuracy test (sanity check) to make sure hqq improves accuracy:

sh release.sh --model_id Qwen/Qwen3-8B --quants INT4 --push_to_hub

no hqq checkpoint: https://huggingface.co/jerryzh168/Qwen3-8B-INT4-non-hqq
hqq checkpoint: https://huggingface.co/jerryzh168/Qwen3-8B-INT4

export MODEL=jerryzh168/Qwen3-8B-INT4-non-hqq
export TASK=mmlu
lm_eval --model hf --model_args pretrained=$MODEL --tasks $TASK --device cuda:0 --batch_size auto

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.7019|±  |0.0036|
| - humanities     |      2|none  |      |acc   |↑  |0.6036|±  |0.0066|
| - other          |      2|none  |      |acc   |↑  |0.7403|±  |0.0076|
| - social sciences|      2|none  |      |acc   |↑  |0.8083|±  |0.0070|
| - stem           |      2|none  |      |acc   |↑  |0.7069|±  |0.0078|

export MODEL=jerryzh168/Qwen3-8B-INT4
lm_eval --model hf --model_args pretrained=$MODEL --tasks $TASK --device cuda:0 --batch_size auto

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.7040|±  |0.0036|
| - humanities     |      2|none  |      |acc   |↑  |0.5962|±  |0.0065|
| - other          |      2|none  |      |acc   |↑  |0.7470|±  |0.0075|
| - social sciences|      2|none  |      |acc   |↑  |0.8177|±  |0.0069|
| - stem           |      2|none  |      |acc   |↑  |0.7114|±  |0.0078|

hqq improves the accuracy for mmlu slightly.

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-08-29T23:11:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2912

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 12aeb58 with merge base 568c193 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py

vkuzo · 2025-09-04T10:23:00Z

torchao/quantization/quantize_/workflows/int4/int4_choose_qparams_algorithm.py

+class Int4ChooseQParamsAlgorithm(str, Enum):
+    """Variant of quantization algorithm to calculate scale and zero_point
+
+    * tinygemm: the choose qparams algorithm native for tinygemm kernel


can we describe what this actually does

Sure, updated with some pseudo code describing the core logic

mergennachin

See inline comment

torchao/quantization/quant_api.py

jerryzh168 · 2025-09-04T20:41:05Z

torchao/quantization/quant_api.py

-        `packing_format`: the packing format for int4 tensor, available from version 2 and above
+        `version`: version of the config to use, only subset of above args are valid for version 1, and subset of above args are valid for version 2, default is 1, see note for more details
+
+    Note:


cc @mergennachin I added some more docs here, please let me know if this helps, only a subset of args will be used for each of the version right now

Documentation is good, but why not also do assertion?

If ignored field are present, don't you wanna throw an exception to the developer?

Okay, talked offline with @jerryzh168

The current approach seems fine. The increased complexity seems not worth it since we have to change the type to Optional.

Summary: * Added Int4ChooseQparamsAlgorithm enum that has TINYGEMM and HQQ options, by default tensors will be using TINYGEMM option * Enabled `Int4ChooseQparamsAlgorithm.HQQ` option for Int4TilePackedTo4dTensor, instead of calling quant primitive ops for tinygemm to quantize the high precision tensor, the `use_hqq=True` path will quantize with `_choose_qparams_and_quantize_affine_hqq` that help improve accuracy for int4 weight only quantization, but still reuse the tinygemm kernel for speedup Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py Accuracy test (sanity check) to make sure hqq improves accuracy: ``` sh release.sh --model_id Qwen/Qwen3-8B --quants INT4 --push_to_hub no hqq checkpoint: https://huggingface.co/jerryzh168/Qwen3-8B-INT4-non-hqq hqq checkpoint: https://huggingface.co/jerryzh168/Qwen3-8B-INT4 export MODEL=jerryzh168/Qwen3-8B-INT4-non-hqq export TASK=mmlu lm_eval --model hf --model_args pretrained=$MODEL --tasks $TASK --device cuda:0 --batch_size auto | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7019|± |0.0036| | - humanities | 2|none | |acc |↑ |0.6036|± |0.0066| | - other | 2|none | |acc |↑ |0.7403|± |0.0076| | - social sciences| 2|none | |acc |↑ |0.8083|± |0.0070| | - stem | 2|none | |acc |↑ |0.7069|± |0.0078| export MODEL=jerryzh168/Qwen3-8B-INT4 lm_eval --model hf --model_args pretrained=$MODEL --tasks $TASK --device cuda:0 --batch_size auto | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7040|± |0.0036| | - humanities | 2|none | |acc |↑ |0.5962|± |0.0065| | - other | 2|none | |acc |↑ |0.7470|± |0.0075| | - social sciences| 2|none | |acc |↑ |0.8177|± |0.0069| | - stem | 2|none | |acc |↑ |0.7114|± |0.0078| hqq improves the accuracy for mmlu slightly. ``` Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2025-09-05T17:13:57Z

I'm merging this now to unblock future PRs, please feel free to add more comment if there is anything else that should be fixed

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 29, 2025

jerryzh168 requested review from andrewor14, drisspg, metascroy and vkuzo August 29, 2025 23:12

jerryzh168 mentioned this pull request Aug 29, 2025

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Open

17 tasks

jerryzh168 force-pushed the add-hqq branch from 1ee1bea to 1ecd129 Compare August 29, 2025 23:27

jerryzh168 added the topic: new feature Use this tag if this PR adds a new feature label Aug 29, 2025

jerryzh168 force-pushed the add-hqq branch 2 times, most recently from cf6526f to 4705e33 Compare September 2, 2025 20:04

mergennachin reviewed Sep 3, 2025

View reviewed changes

torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the add-hqq branch 2 times, most recently from 9c7f7e1 to 953f1f9 Compare September 4, 2025 00:57

vkuzo reviewed Sep 4, 2025

View reviewed changes

jerryzh168 force-pushed the add-hqq branch 2 times, most recently from ad5e93c to b8db855 Compare September 4, 2025 17:45

jerryzh168 requested review from mergennachin and vkuzo September 4, 2025 17:50

mergennachin approved these changes Sep 4, 2025

View reviewed changes

torchao/quantization/quant_api.py Show resolved Hide resolved

jerryzh168 force-pushed the add-hqq branch from b8db855 to a00eaa0 Compare September 4, 2025 20:40

jerryzh168 commented Sep 4, 2025

View reviewed changes

jerryzh168 force-pushed the add-hqq branch from a00eaa0 to 568230e Compare September 4, 2025 20:41

jerryzh168 force-pushed the add-hqq branch from 568230e to 12aeb58 Compare September 4, 2025 20:47

jerryzh168 merged commit 2dacd7f into pytorch:main Sep 5, 2025
18 checks passed

jerryzh168 mentioned this pull request Sep 18, 2025

Filling some Int4 tensor feature gaps after tensor subclass migration #3013

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add hqq support for Int4TilePackedTo4dTensor #2912

Add hqq support for Int4TilePackedTo4dTensor #2912

Uh oh!

jerryzh168 commented Aug 29, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

vkuzo Sep 4, 2025

Uh oh!

jerryzh168 Sep 4, 2025

Uh oh!

mergennachin left a comment

Uh oh!

Uh oh!

jerryzh168 Sep 4, 2025

Uh oh!

mergennachin Sep 4, 2025

Uh oh!

mergennachin Sep 4, 2025

Uh oh!

jerryzh168 commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add hqq support for Int4TilePackedTo4dTensor #2912

Add hqq support for Int4TilePackedTo4dTensor #2912

Uh oh!

Conversation

jerryzh168 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2912

✅ No Failures

Uh oh!

Uh oh!

vkuzo Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jerryzh168 commented Aug 29, 2025 •

edited

Loading

pytorch-bot bot commented Aug 29, 2025 •

edited

Loading