Tags · pytorch/ao

v0.14.0-rc1

enable select for NVFP4Tensor (#3117)

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

Oct 6, 2025
c96f2dd
zip
tar.gz

v0.13.0-rc8

better check for mxfp8 cuda kernel presence (#2933)

Summary:

Short term fix for #2932.
If torchao was build without CUDA 10.0 (such as in our CI), ensures
that:
a. only callsites which actually use the mxfp8 dim1 kernel see the error
message. Using NVFP4 no longer hits this error.
b. make the error message point to github issue for more info on the
workaround (for now, build from souce).

Test Plan:

1. hardcode mxfp8 kernel from being built:
https://github.com/pytorch/ao/blob/85557135c93d3429320a4a360c0ee9cb49f84a00/setup.py#L641
2. build torchao from source, verify `torchao/prototype` does not have
   any `.so` files
3. run nvfp4 tests, verify they now pass: `pytest test/prototype/mx_formats/test_nvfp4_tensor.py -s -x`
4. run mxfp8 linear tests, verify the new error message is displayed for
   dim1 kernel tests: `pytest test/prototype/mx_formats/test_mx_linear.py -s -x -k test_linear_eager_vs_hp`
5. undo the change in (1), rebuild torchao, verify all mx tests pass: `pytest test/prototype/mx_formats/ -s -x`

Reviewers:

Subscribers:

Tasks:

Tags:

Sep 3, 2025
379010f
zip
tar.gz
Notes

v0.13.0

another fix for torch version (#2922)

Summary:

`torch.__version__` has unexpected behavior when comparing to a string:

```python
(Pdb) torch.__version__
'2.9.0.dev20250902+cu128'
(Pdb) str(torch.__version__)
'2.9.0.dev20250902+cu128'
(Pdb) '2.9.0.dev20250902+cu128' >= '2.9'
True
(Pdb) torch.__version__ >= '2.9'
False
(Pdb) torch.__version__ >= (2, 9)
False
(Pdb) torch.__version__ >= (2, 9, 0)
False
(Pdb) str(torch.__version__) >= '2.9'
True
```

To unblock the release, for now compare `str(torch.__version__)` to
force the behavior we want for `torch==2.9.x`. We should make this
more robust, saving that for a future PR.

Test Plan:

```
1. install torchao 0.13.0 from pip
2. install torch 2.8.0, verify torchao imports without errors
3. isntall torch 2.9.x, verify torchao imports correctly and a warning
   for skipping c++ kernel import is shown
```

Reviewers:

Subscribers:

Tasks:

Tags:

Sep 2, 2025
e318546
zip
tar.gz

v0.13.0-rc7

Exclude libcuda.so from auditwheel replair (#2927)

* Exclude libcuda.so from auditwheel replair

* Update build_wheels_linux.yml

* Update post_build_script.sh

Sep 2, 2025
22638cd
zip
tar.gz

v0.13.0-rc6

another fix for torch version (#2922)

Summary:

`torch.__version__` has unexpected behavior when comparing to a string:

```python
(Pdb) torch.__version__
'2.9.0.dev20250902+cu128'
(Pdb) str(torch.__version__)
'2.9.0.dev20250902+cu128'
(Pdb) '2.9.0.dev20250902+cu128' >= '2.9'
True
(Pdb) torch.__version__ >= '2.9'
False
(Pdb) torch.__version__ >= (2, 9)
False
(Pdb) torch.__version__ >= (2, 9, 0)
False
(Pdb) str(torch.__version__) >= '2.9'
True
```

To unblock the release, for now compare `str(torch.__version__)` to
force the behavior we want for `torch==2.9.x`. We should make this
more robust, saving that for a future PR.

Test Plan:

```
1. install torchao 0.13.0 from pip
2. install torch 2.8.0, verify torchao imports without errors
3. isntall torch 2.9.x, verify torchao imports correctly and a warning
   for skipping c++ kernel import is shown
```

Reviewers:

Subscribers:

Tasks:

Tags:

Sep 2, 2025
e318546
zip
tar.gz

v0.13.0-rc5

change missing ops printout back to debug (#2921)

Summary:

Undoes part of #2908 to make
the message about missing `.so` files be a debug print instead of a
warning.  Reason: this always happens for builds without executorch ops.

Keeps the version mismatch log as a warning.

Test Plan:

Make this change locally in an install of torchao on an H100, verify
warning no longer prints.

Reviewers:

Subscribers:

Tasks:

Tags:

Sep 2, 2025
1e487c2
zip
tar.gz

v0.13.0-rc4

fix torchao version check on torch version (#2918)

Summary:

Fix error in #2908. The version string for
PyTorch 2.8 reads "2.8.0...", so we need to compare `>= 2.9` to properly
gate out PyTorch 2.9.

Test Plan:

1. make this change in a locally installed __init__ file of torchao
   downloaded via pip
2. install PyTorch 2.8.0
3. import torchao, verify warning was not hit

Reviewers:

Subscribers:

Tasks:

Tags:

Sep 2, 2025
adb8f1a
zip
tar.gz

v0.13.0-rc3

torchao init: do not load .so files for known incompatible torch vers…

…ion (#2908)

Summary:

Short term fix for #2901 to unblock
the 0.13.0 release.

Long version:
1. torchao's c++ kernels are not using libtorch and therefore are not
   guaranteed to work across different PyTorch versions
2. looks like we got lucky with (1) as torchao kernels just happened to
   work across PyTorch versions <= 2.8, but PyTorch nightlies in 2.9
   introduce a breaking ABI change (I don't know what specifically).
   Therefore, if we build torchao with torch 2.8, and then import it in
   an environment with torch 2.9+, the Python process will crash with
   `Aborted (core dumped)`.

For now, I just gate out the "known broken" case where we detect
that the torch version used to build torchao is < 2.9, and the torch
version in the environment when torchao is imported is >= 2.9. If this
is detected, this PR skips importing the `.so` files and logs a warning,
to at least have most of the torchao Python API still work and give the
user some information about how to get the custom kernels working.

For future releases, we'll need to make this more robust - leaving that
for future PRs.

Test Plan:

```bash
// install the 0.13.0 RC, built with PyTorch 2.8
with-proxy pip install torchao==0.13.0 --extra-index-url https://download.pytorch.org/whl/test/cu128

// copy over these changes to the local __init__.py file in the installation:
// ~/.conda/envs/pytorch_nightly/lib/python3.11/site-packages/torchao/__init__.py

// install PyTorch 2.9.x nightly
with-proxy pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

// import torchao, verify no more crash and the warning message is emitted
(pytorch_nightly) [vasiliy@devgpu007.eag6 ~/local]$ python -X faulthandler -c "import torch; print(torch.__version__); import torchao"
2.9.0.dev20250829+cu128
Skipping import of cpp extensions due to incompatible torch version 2.9.0.dev20250829+cu128 for torchao version 0.13.0+cu128
```

Reviewers:

Subscribers:

Tasks:

Tags:

Aug 29, 2025
4dde983
zip
tar.gz

v0.13.0-rc2

Support QAT int4 v1 path for BC (#2888)

**Summary:** `Int4WeightOnlyConfig` supports version 1 (targeting
tinygemm) and version 2 (targeting fbgemm). However, the latter
requires a new dependency (fbgemm_gpu_genai >= 1.2.0), which is
problematic for torchao integrations with other frameworks.
For now, we should continue to support the v1 path for BC.

**Test Plan:**
```
python test/quantization/test_qat.py -k
test_infer_int4_weight_only_config
```

Aug 28, 2025
13029fb
zip
tar.gz

v0.13.0-rc1

Add NVFP4 QAT (#2666)

* [bc-breaking] Generalize FakeQuantizeConfig beyond intx

 **Summary:** The existing `FakeQuantizeConfig` performs only
intx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:

```
# New abstract class
FakeQuantizeConfigBase
# Rename
FakeQuantizeConfig -> IntxFakeQuantizeConfig
```

In the future, we will have other types of `FakeQuantizeConfigBase`
for float dtypes that users can pass in instead of the existing
Intx one.

**BC-breaking notes:** For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
`FakeQuantizeConfigBase` instead. For the most part, this abstract
class will be hidden from the user.

Before:
```
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
```

After:
```
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
```

**Test Plan:**
python test/quantization/test_qat.py

[ghstack-poisoned]

* New multi-step QAT API

**Summary:** This commit adds a new multi-step QAT API with the
main goal of simplifying the existing UX. The new API uses the
same `QATConfig` for both the prepare and convert steps, and
automatically infers the fake quantization configs based on
a PTQ base config provided by the user:

```
from torchao.quantization import (
    quantize_,
    Int8DynamicActivationInt4WeightConfig
)
from torchao.quantization.qat import QATConfig

\# prepare
base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
qat_config = QATConfig(base_config, step="prepare")
quantize_(m, qat_config)

\# train (not shown)

\# convert
quantize_(m, QATConfig(base_config, step="convert"))
```

The main improvements include:
- A single config for both prepare and convert steps
- A single quantize_ for convert (instead of 2)
- No chance for incompatible prepare vs convert configs
- Much less boilerplate code for most common use case
- Simpler config names

For less common use cases such as experimentation, users can
still specify arbitrary fake quantization configs for
activations and/or weights as before. This is still important
since there may not always be a corresponding PTQ base config.
For example:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import IntxFakeQuantizeConfig, QATConfig

activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = QATConfig(
    activation_config=activation_config,
    weight_config=weight_config,
    step="prepare",
)
quantize_(model, qat_config)

\# train and convert same as above (not shown)
```

**BC-breaking notes:** This change by itself is technically not
BC-breaking since we keep around the old path, but will become
so when we deprecate and remove the old path in the future.

Before:
```
\# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
quantize_(model, qat_config)

\# train (not shown)

\# convert
quantize_(model, FromIntXQuantizationAwareTrainingConfig())
quantize_(model, Int8DynamicActivationInt4WeightConfig(group_size=32))
```

After: (see above)

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]

* Update on "New multi-step QAT API"


**Summary:** This commit adds a new multi-step QAT API with the
main goal of simplifying the existing UX. The new API uses the
same `QATConfig` for both the prepare and convert steps, and
automatically infers the fake quantization configs based on
a PTQ base config provided by the user:

```
from torchao.quantization import (
    quantize_,
    Int8DynamicActivationInt4WeightConfig
)
from torchao.quantization.qat import QATConfig

# prepare
base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
quantize_(m, QATConfig(base_config, step="prepare"))

# train (not shown)

# convert
quantize_(m, QATConfig(base_config, step="convert"))
```

The main improvements include:
- A single config for both prepare and convert steps
- A single quantize_ for convert (instead of 2)
- No chance for incompatible prepare vs convert configs
- Much less boilerplate code for most common use case
- Simpler config names

For less common use cases such as experimentation, users can
still specify arbitrary fake quantization configs for
activations and/or weights as before. This is still important
since there may not always be a corresponding PTQ base config.
For example:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import IntxFakeQuantizeConfig, QATConfig

# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = QATConfig(
    activation_config=activation_config,
    weight_config=weight_config,
    step="prepare",
)
quantize_(model, qat_config)

# train and convert same as above (not shown)
```

**BC-breaking notes:** This change by itself is technically not
BC-breaking since we keep around the old path, but will become
so when we deprecate and remove the old path in the future.

Before:
```
# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
quantize_(model, qat_config)

# train (not shown)

# convert
quantize_(model, FromIntXQuantizationAwareTrainingConfig())
quantize_(model, Int8DynamicActivationInt4WeightConfig(group_size=32))
```

After: (see above)

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]

* Update on "New multi-step QAT API"


**Summary:** This commit adds a new multi-step QAT API with the
main goal of simplifying the existing UX. The new API uses the
same `QATConfig` for both the prepare and convert steps, and
automatically infers the fake quantization configs based on
a PTQ base config provided by the user:

```
from torchao.quantization import (
    quantize_,
    Int8DynamicActivationInt4WeightConfig
)
from torchao.quantization.qat import QATConfig

# prepare
base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
quantize_(m, QATConfig(base_config, step="prepare"))

# train (not shown)

# convert
quantize_(m, QATConfig(base_config, step="convert"))
```

The main improvements include:
- A single config for both prepare and convert steps
- A single quantize_ for convert (instead of 2)
- No chance for incompatible prepare vs convert configs
- Much less boilerplate code for most common use case
- Simpler config names

For less common use cases such as experimentation, users can
still specify arbitrary fake quantization configs for
activations and/or weights as before. This is still important
since there may not always be a corresponding PTQ base config.
For example:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import IntxFakeQuantizeConfig, QATConfig

# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = QATConfig(
    activation_config=activation_config,
    weight_config=weight_config,
    step="prepare",
)
quantize_(model, qat_config)

# train and convert same as above (not shown)
```

**BC-breaking notes:** This change by itself is technically not
BC-breaking since we keep around the old path, but will become
so when we deprecate and remove the old path in the future.

Before:
```
# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
quantize_(model, qat_config)

# train (not shown)

# convert
quantize_(model, FromIntXQuantizationAwareTrainingConfig())
quantize_(model, Int8DynamicActivationInt4WeightConfig(group_size=32))
```

After: (see above)

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]

* Deprecate old QAT APIs

**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Add NVFP4 QAT

**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

Initial benchmarks on fine-tuning Qwen3-1.7B on oasst1 for 3 epochs:
```
# Without QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.7927|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7323|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |18.8815|±  |   N/A|

# With QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.7921|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7316|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |18.8409|±  |   N/A|
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

Initial benchmarks on fine-tuning Qwen3-1.7B on alpaca for 3 epochs:
```
# Without QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8322|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7804|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.8611|±  |   N/A|

# With QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8271|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7741|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.4467|±  |   N/A|
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

Initial benchmarks on fine-tuning Qwen3-1.7B on alpaca for 3 epochs:
```
# Without QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8322|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7804|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.8611|±  |   N/A|

# With QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8271|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7741|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.4467|±  |   N/A|
```

[ghstack-poisoned]

Aug 25, 2025
f3e549c
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.14.0-rc1

v0.13.0-rc8

v0.13.0

v0.13.0-rc7

v0.13.0-rc6

v0.13.0-rc5

v0.13.0-rc4

v0.13.0-rc3

v0.13.0-rc2

v0.13.0-rc1

Tags: pytorch/ao