Bump `Int4WeightOnlyConfig` version to 2 by jerryzh168 · Pull Request #2949 · pytorch/ao

jerryzh168 · 2025-09-05T23:33:17Z

Summary:
Current Int4WeightOnlyConfig has version 1 and 2, and default is 1, this PR

changes the default to 2 and made modification to callsites.
For the Int4WeightOnlyConfig using explicit version 2, we removed the version=2 since now default is 2
For the Int4WeightOnlyConfig that's using the old configuration, we added explicit version=1, we can migrate the callsite to use the version 2 separately (note this is done in Add version=1 for calls to int4 weight only config #2958)
also added deprecation warning in
- quant_api for v1 path of Int4WeightOnlyConfig
- different layouts (TensorCoreTiledLayout, MarlinSparseLayout, Int4CPULayout and Int4XPULayout) for quantized checkpoints with v1 config
For READMEs we migrate the usage to version 2 directly
Also added deprecation warning testing for v1
- single linear: https://huggingface.co/torchao-testing/single-linear-Int4WeightOnlyConfig-v1-0.14.dev (testing checkpoint only)
- opt-125m: https://huggingface.co/torchao-testing/opt-125m-Int4WeightOnlyConfig-v1-0.14.dev (testing both checkpoint and config)

Deprecation Note:

We updated the implementation for int4 Tensor, so bumps the default version from 1 to 2 for these two configs.

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "torchao-testing/opt-125m-Int4WeightOnlyConfig-v1-0.14.dev"
quantized_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="bfloat16",
    device_map="cuda",
)

/data/users/jerryzh/ao/torchao/core/config.py:250: UserWarning: Stored version is not the same as current default version of the config: stored_version=1, current_default_version=2, please check the deprecation warning
  warnings.warn(
/data/users/jerryzh/ao/torchao/dtypes/uintx/tensor_core_tiled_layout.py:241: UserWarning: Models quantized with version 1 of Int4WeightOnlyConfig is deprecated and will no longer be supported in a future release, please upgrade torchao and quantize again, or download a newer torchao checkpoint, see https://github.com/pytorch/ao/issues/2948 for more details
  warnings.warn(

Suggestion: upgrade torchao to 0.14 and later and generate the checkpoint again:

quantize_(model, Int4WeightOnlyConfig(group_size=128))

Or download the checkpoint again (please let us know if the checkpoint is not updated)

Please see #2948 for more details around the deprecation.

Test Plan:
Regression tests:
python test/dtypes/test_affine_quantized.py
python test/quantization/test_quant_api.py
python test/quantization/quantize_/workflows/int4/test_int4_marlin_sparse_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py
python test/integration/test_load_and_run_checkpoint.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-09-05T23:33:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2949

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d2168f2 with merge base c452495 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This is in preparation for version bump in #2949 added version=1 for both `int4_weight_only` and `Int4WeightOnlyConfig` Test Plan: regression tests with CI Reviewers: Subscribers: Tasks: Tags:

metascroy · 2025-09-09T00:51:45Z

 _int4_quant_code = """
 from torchao.quantization import Int4WeightOnlyConfig
-quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq", version=2)
+quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")


It's called int4_packing_format now, no?

yeah that's true, I have updated locally, will push change together with other things

jerryzh168 · 2025-09-09T00:51:47Z

 _int4_quant_code = """
 from torchao.quantization import Int4WeightOnlyConfig
-quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq", version=2)
+quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")


just found we also need to update packing_format to int4_packing_format I have made change locally, can push these changes before land.

metascroy · 2025-09-09T00:52:48Z

Should you import to fbcode to see if you break any internal tests?

facebook-github-bot · 2025-09-09T00:55:02Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81985661.

jerryzh168 · 2025-09-09T01:14:09Z

looks like there are some conflicts in importing, I'll unlink and merge for now, will rely on diff train

Summary: Current Int4WeightOnlyConfig has version 1 and 2, and default is 1, this PR changes the default to 2 and made modification to callsites. For the Int4WeightOnlyConfig that's using the old configuration, we added explicit `version=1`, we can migrate the callsite to use the version 2 separately For READMEs we migrate the usage to version 2 directly Deprecation: TODO Test Plan: Regression tests: python test/dtypes/test_affine_quantized.py python test/quantization/test_quant_api.py python test/quantization/quantize_/workflows/int4/test_int4_marlin_sparse_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py Reviewers: Subscribers: Tasks: Tags:

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 5, 2025

jerryzh168 mentioned this pull request Sep 5, 2025

Deprecation for Int4WeightOnlyConfig (version 1) and the models #2948

Closed

jerryzh168 force-pushed the bump-int4-version branch from 2341ca6 to e00fe75 Compare September 8, 2025 18:08

jerryzh168 added the module: deprecation Use this tag if this PR deprecates a feature label Sep 8, 2025

jerryzh168 force-pushed the bump-int4-version branch from e00fe75 to 0ad98af Compare September 8, 2025 19:22

jerryzh168 mentioned this pull request Sep 8, 2025

Add version=1 for calls to int4 weight only config #2958

Merged

jerryzh168 force-pushed the bump-int4-version branch 2 times, most recently from edca31d to 5301a7e Compare September 8, 2025 23:17

jerryzh168 requested review from andrewor14, metascroy and vkuzo September 8, 2025 23:18

jerryzh168 force-pushed the bump-int4-version branch from 5301a7e to 9364280 Compare September 8, 2025 23:21

jerryzh168 marked this pull request as ready for review September 8, 2025 23:21

jerryzh168 changed the title ~~Bump int4 weight only config version to 2~~ Bump Int4WeightOnlyConfig version to 2 Sep 8, 2025

metascroy reviewed Sep 9, 2025

View reviewed changes

jerryzh168 commented Sep 9, 2025

View reviewed changes

metascroy approved these changes Sep 9, 2025

View reviewed changes

jerryzh168 force-pushed the bump-int4-version branch from 9364280 to 7e8b47d Compare September 9, 2025 00:54

jerryzh168 force-pushed the bump-int4-version branch from 7e8b47d to d2168f2 Compare September 9, 2025 02:39

jerryzh168 merged commit b10876b into main Sep 9, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump `Int4WeightOnlyConfig` version to 2#2949

Bump `Int4WeightOnlyConfig` version to 2#2949
jerryzh168 merged 1 commit into
mainfrom
bump-int4-version

jerryzh168 commented Sep 5, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

metascroy Sep 9, 2025

Uh oh!

jerryzh168 Sep 9, 2025

Uh oh!

jerryzh168 Sep 9, 2025

Uh oh!

metascroy commented Sep 9, 2025

Uh oh!

facebook-github-bot commented Sep 9, 2025

Uh oh!

jerryzh168 commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jerryzh168 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deprecation Note:

Uh oh!

pytorch-bot Bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2949

✅ No Failures

Uh oh!

metascroy Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

metascroy commented Sep 9, 2025

Uh oh!

facebook-github-bot commented Sep 9, 2025

Uh oh!

jerryzh168 commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jerryzh168 commented Sep 5, 2025 •

edited

Loading

pytorch-bot Bot commented Sep 5, 2025 •

edited

Loading