Skip to content

Tags: pytorch/ao

Tags

v0.14.0-rc1

Toggle v0.14.0-rc1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
enable select for NVFP4Tensor (#3117)

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

v0.13.0-rc8

Toggle v0.13.0-rc8's commit message
better check for mxfp8 cuda kernel presence (#2933)

Summary:

Short term fix for #2932.
If torchao was build without CUDA 10.0 (such as in our CI), ensures
that:
a. only callsites which actually use the mxfp8 dim1 kernel see the error
message. Using NVFP4 no longer hits this error.
b. make the error message point to github issue for more info on the
workaround (for now, build from souce).

Test Plan:

1. hardcode mxfp8 kernel from being built:
https://github.com/pytorch/ao/blob/85557135c93d3429320a4a360c0ee9cb49f84a00/setup.py#L641
2. build torchao from source, verify `torchao/prototype` does not have
   any `.so` files
3. run nvfp4 tests, verify they now pass: `pytest test/prototype/mx_formats/test_nvfp4_tensor.py -s -x`
4. run mxfp8 linear tests, verify the new error message is displayed for
   dim1 kernel tests: `pytest test/prototype/mx_formats/test_mx_linear.py -s -x -k test_linear_eager_vs_hp`
5. undo the change in (1), rebuild torchao, verify all mx tests pass: `pytest test/prototype/mx_formats/ -s -x`

Reviewers:

Subscribers:

Tasks:

Tags:

v0.13.0

Toggle v0.13.0's commit message
another fix for torch version (#2922)

Summary:

`torch.__version__` has unexpected behavior when comparing to a string:

```python
(Pdb) torch.__version__
'2.9.0.dev20250902+cu128'
(Pdb) str(torch.__version__)
'2.9.0.dev20250902+cu128'
(Pdb) '2.9.0.dev20250902+cu128' >= '2.9'
True
(Pdb) torch.__version__ >= '2.9'
False
(Pdb) torch.__version__ >= (2, 9)
False
(Pdb) torch.__version__ >= (2, 9, 0)
False
(Pdb) str(torch.__version__) >= '2.9'
True
```

To unblock the release, for now compare `str(torch.__version__)` to
force the behavior we want for `torch==2.9.x`. We should make this
more robust, saving that for a future PR.

Test Plan:

```
1. install torchao 0.13.0 from pip
2. install torch 2.8.0, verify torchao imports without errors
3. isntall torch 2.9.x, verify torchao imports correctly and a warning
   for skipping c++ kernel import is shown
```

Reviewers:

Subscribers:

Tasks:

Tags:

v0.13.0-rc7

Toggle v0.13.0-rc7's commit message
Exclude libcuda.so from auditwheel replair (#2927)

* Exclude libcuda.so from auditwheel replair

* Update build_wheels_linux.yml

* Update post_build_script.sh

v0.13.0-rc6

Toggle v0.13.0-rc6's commit message
another fix for torch version (#2922)

Summary:

`torch.__version__` has unexpected behavior when comparing to a string:

```python
(Pdb) torch.__version__
'2.9.0.dev20250902+cu128'
(Pdb) str(torch.__version__)
'2.9.0.dev20250902+cu128'
(Pdb) '2.9.0.dev20250902+cu128' >= '2.9'
True
(Pdb) torch.__version__ >= '2.9'
False
(Pdb) torch.__version__ >= (2, 9)
False
(Pdb) torch.__version__ >= (2, 9, 0)
False
(Pdb) str(torch.__version__) >= '2.9'
True
```

To unblock the release, for now compare `str(torch.__version__)` to
force the behavior we want for `torch==2.9.x`. We should make this
more robust, saving that for a future PR.

Test Plan:

```
1. install torchao 0.13.0 from pip
2. install torch 2.8.0, verify torchao imports without errors
3. isntall torch 2.9.x, verify torchao imports correctly and a warning
   for skipping c++ kernel import is shown
```

Reviewers:

Subscribers:

Tasks:

Tags:

v0.13.0-rc5

Toggle v0.13.0-rc5's commit message
change missing ops printout back to debug (#2921)

Summary:

Undoes part of #2908 to make
the message about missing `.so` files be a debug print instead of a
warning.  Reason: this always happens for builds without executorch ops.

Keeps the version mismatch log as a warning.

Test Plan:

Make this change locally in an install of torchao on an H100, verify
warning no longer prints.

Reviewers:

Subscribers:

Tasks:

Tags:

v0.13.0-rc4

Toggle v0.13.0-rc4's commit message
fix torchao version check on torch version (#2918)

Summary:

Fix error in #2908. The version string for
PyTorch 2.8 reads "2.8.0...", so we need to compare `>= 2.9` to properly
gate out PyTorch 2.9.

Test Plan:

1. make this change in a locally installed __init__ file of torchao
   downloaded via pip
2. install PyTorch 2.8.0
3. import torchao, verify warning was not hit

Reviewers:

Subscribers:

Tasks:

Tags:

v0.13.0-rc3

Toggle v0.13.0-rc3's commit message
torchao init: do not load .so files for known incompatible torch vers…

…ion (#2908)

Summary:

Short term fix for #2901 to unblock
the 0.13.0 release.

Long version:
1. torchao's c++ kernels are not using libtorch and therefore are not
   guaranteed to work across different PyTorch versions
2. looks like we got lucky with (1) as torchao kernels just happened to
   work across PyTorch versions <= 2.8, but PyTorch nightlies in 2.9
   introduce a breaking ABI change (I don't know what specifically).
   Therefore, if we build torchao with torch 2.8, and then import it in
   an environment with torch 2.9+, the Python process will crash with
   `Aborted (core dumped)`.

For now, I just gate out the "known broken" case where we detect
that the torch version used to build torchao is < 2.9, and the torch
version in the environment when torchao is imported is >= 2.9. If this
is detected, this PR skips importing the `.so` files and logs a warning,
to at least have most of the torchao Python API still work and give the
user some information about how to get the custom kernels working.

For future releases, we'll need to make this more robust - leaving that
for future PRs.

Test Plan:

```bash
// install the 0.13.0 RC, built with PyTorch 2.8
with-proxy pip install torchao==0.13.0 --extra-index-url https://download.pytorch.org/whl/test/cu128

// copy over these changes to the local __init__.py file in the installation:
// ~/.conda/envs/pytorch_nightly/lib/python3.11/site-packages/torchao/__init__.py

// install PyTorch 2.9.x nightly
with-proxy pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

// import torchao, verify no more crash and the warning message is emitted
(pytorch_nightly) [vasiliy@devgpu007.eag6 ~/local]$ python -X faulthandler -c "import torch; print(torch.__version__); import torchao"
2.9.0.dev20250829+cu128
Skipping import of cpp extensions due to incompatible torch version 2.9.0.dev20250829+cu128 for torchao version 0.13.0+cu128
```

Reviewers:

Subscribers:

Tasks:

Tags:

v0.13.0-rc2

Toggle v0.13.0-rc2's commit message
Support QAT int4 v1 path for BC (#2888)

**Summary:** `Int4WeightOnlyConfig` supports version 1 (targeting
tinygemm) and version 2 (targeting fbgemm). However, the latter
requires a new dependency (fbgemm_gpu_genai >= 1.2.0), which is
problematic for torchao integrations with other frameworks.
For now, we should continue to support the v1 path for BC.

**Test Plan:**
```
python test/quantization/test_qat.py -k
test_infer_int4_weight_only_config
```

v0.13.0-rc1

Toggle v0.13.0-rc1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add NVFP4 QAT (#2666)

* [bc-breaking] Generalize FakeQuantizeConfig beyond intx

 **Summary:** The existing `FakeQuantizeConfig` performs only
intx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:

```
# New abstract class
FakeQuantizeConfigBase
# Rename
FakeQuantizeConfig -> IntxFakeQuantizeConfig
```

In the future, we will have other types of `FakeQuantizeConfigBase`
for float dtypes that users can pass in instead of the existing
Intx one.

**BC-breaking notes:** For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
`FakeQuantizeConfigBase` instead. For the most part, this abstract
class will be hidden from the user.

Before:
```
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
```

After:
```
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
```

**Test Plan:**
python test/quantization/test_qat.py

[ghstack-poisoned]

* New multi-step QAT API

**Summary:** This commit adds a new multi-step QAT API with the
main goal of simplifying the existing UX. The new API uses the
same `QATConfig` for both the prepare and convert steps, and
automatically infers the fake quantization configs based on
a PTQ base config provided by the user:

```
from torchao.quantization import (
    quantize_,
    Int8DynamicActivationInt4WeightConfig
)
from torchao.quantization.qat import QATConfig

\# prepare
base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
qat_config = QATConfig(base_config, step="prepare")
quantize_(m, qat_config)

\# train (not shown)

\# convert
quantize_(m, QATConfig(base_config, step="convert"))
```

The main improvements include:
- A single config for both prepare and convert steps
- A single quantize_ for convert (instead of 2)
- No chance for incompatible prepare vs convert configs
- Much less boilerplate code for most common use case
- Simpler config names

For less common use cases such as experimentation, users can
still specify arbitrary fake quantization configs for
activations and/or weights as before. This is still important
since there may not always be a corresponding PTQ base config.
For example:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import IntxFakeQuantizeConfig, QATConfig

activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = QATConfig(
    activation_config=activation_config,
    weight_config=weight_config,
    step="prepare",
)
quantize_(model, qat_config)

\# train and convert same as above (not shown)
```

**BC-breaking notes:** This change by itself is technically not
BC-breaking since we keep around the old path, but will become
so when we deprecate and remove the old path in the future.

Before:
```
\# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
quantize_(model, qat_config)

\# train (not shown)

\# convert
quantize_(model, FromIntXQuantizationAwareTrainingConfig())
quantize_(model, Int8DynamicActivationInt4WeightConfig(group_size=32))
```

After: (see above)

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]

* Update on "New multi-step QAT API"


**Summary:** This commit adds a new multi-step QAT API with the
main goal of simplifying the existing UX. The new API uses the
same `QATConfig` for both the prepare and convert steps, and
automatically infers the fake quantization configs based on
a PTQ base config provided by the user:

```
from torchao.quantization import (
    quantize_,
    Int8DynamicActivationInt4WeightConfig
)
from torchao.quantization.qat import QATConfig

# prepare
base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
quantize_(m, QATConfig(base_config, step="prepare"))

# train (not shown)

# convert
quantize_(m, QATConfig(base_config, step="convert"))
```

The main improvements include:
- A single config for both prepare and convert steps
- A single quantize_ for convert (instead of 2)
- No chance for incompatible prepare vs convert configs
- Much less boilerplate code for most common use case
- Simpler config names

For less common use cases such as experimentation, users can
still specify arbitrary fake quantization configs for
activations and/or weights as before. This is still important
since there may not always be a corresponding PTQ base config.
For example:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import IntxFakeQuantizeConfig, QATConfig

# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = QATConfig(
    activation_config=activation_config,
    weight_config=weight_config,
    step="prepare",
)
quantize_(model, qat_config)

# train and convert same as above (not shown)
```

**BC-breaking notes:** This change by itself is technically not
BC-breaking since we keep around the old path, but will become
so when we deprecate and remove the old path in the future.

Before:
```
# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
quantize_(model, qat_config)

# train (not shown)

# convert
quantize_(model, FromIntXQuantizationAwareTrainingConfig())
quantize_(model, Int8DynamicActivationInt4WeightConfig(group_size=32))
```

After: (see above)

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]

* Update on "New multi-step QAT API"


**Summary:** This commit adds a new multi-step QAT API with the
main goal of simplifying the existing UX. The new API uses the
same `QATConfig` for both the prepare and convert steps, and
automatically infers the fake quantization configs based on
a PTQ base config provided by the user:

```
from torchao.quantization import (
    quantize_,
    Int8DynamicActivationInt4WeightConfig
)
from torchao.quantization.qat import QATConfig

# prepare
base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
quantize_(m, QATConfig(base_config, step="prepare"))

# train (not shown)

# convert
quantize_(m, QATConfig(base_config, step="convert"))
```

The main improvements include:
- A single config for both prepare and convert steps
- A single quantize_ for convert (instead of 2)
- No chance for incompatible prepare vs convert configs
- Much less boilerplate code for most common use case
- Simpler config names

For less common use cases such as experimentation, users can
still specify arbitrary fake quantization configs for
activations and/or weights as before. This is still important
since there may not always be a corresponding PTQ base config.
For example:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import IntxFakeQuantizeConfig, QATConfig

# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = QATConfig(
    activation_config=activation_config,
    weight_config=weight_config,
    step="prepare",
)
quantize_(model, qat_config)

# train and convert same as above (not shown)
```

**BC-breaking notes:** This change by itself is technically not
BC-breaking since we keep around the old path, but will become
so when we deprecate and remove the old path in the future.

Before:
```
# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
quantize_(model, qat_config)

# train (not shown)

# convert
quantize_(model, FromIntXQuantizationAwareTrainingConfig())
quantize_(model, Int8DynamicActivationInt4WeightConfig(group_size=32))
```

After: (see above)

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]

* Deprecate old QAT APIs

**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Update base for Update on "Deprecate old QAT APIs"


**Summary:** Deprecates QAT APIs that should no longer be used.
Print helpful deprecation warning to help users migrate.

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_api_deprecation
```

Also manual testing:
```
>>> from torchao.quantization.qat import IntXQuantizationAwareTrainingConfig
>>> IntXQuantizationAwareTrainingConfig()
'IntXQuantizationAwareTrainingConfig' is deprecated and will be removed in a future release. Please use the following API instead:

    base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
    quantize_(model, QATConfig(base_config, step="prepare"))
    # train (not shown)
    quantize_(model, QATConfig(base_config, step="convert"))

Alternatively, if you prefer to pass in fake quantization configs:

    activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
    weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
    qat_config = QATConfig(
        activation_config=activation_config,
        weight_config=weight_config,
        step="prepare",
    )
    quantize_(model, qat_config)

Please see #2630 for more details.

IntXQuantizationAwareTrainingConfig(activation_config=None, weight_config=None)
```

[ghstack-poisoned]

* Add NVFP4 QAT

**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

Initial benchmarks on fine-tuning Qwen3-1.7B on oasst1 for 3 epochs:
```
# Without QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.7927|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7323|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |18.8815|±  |   N/A|

# With QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.7921|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7316|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |18.8409|±  |   N/A|
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

Initial benchmarks on fine-tuning Qwen3-1.7B on alpaca for 3 epochs:
```
# Without QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8322|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7804|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.8611|±  |   N/A|

# With QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8271|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7741|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.4467|±  |   N/A|
```

[ghstack-poisoned]

* Update base for Update on "Add NVFP4 QAT"


**Summary:** This commit adds a QAT flow for NVFP4, following the
numerics in `NVFP4Tensor` closely but without the dtyping casting,
swizzling, and the packing/unpacking. Users can call this flow as follows:

```
from torchao.quantization import quantize_
from torchao.quantization.qat import NVFP4FakeQuantizeConfig, QATConfig

qat_config = QATConfig(
    activation_config=NVFP4FakeQuantizeConfig(),
    weight_config=NVFP4FakeQuantizeConfig(),
    step="prepare",
)
quantize_(model, qat_config)
```

**Test Plan:**
```
python test/quantization/test_qat.py -k test_qat_nvfp4
```

Initial benchmarks on fine-tuning Qwen3-1.7B on alpaca for 3 epochs:
```
# Without QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8322|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7804|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.8611|±  |   N/A|

# With QAT
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|------|---------------|---|------:|---|------|
|wikitext|      2|none  |None  |bits_per_byte  |↓  | 0.8271|±  |   N/A|
|        |       |none  |None  |byte_perplexity|↓  | 1.7741|±  |   N/A|
|        |       |none  |None  |word_perplexity|↓  |21.4467|±  |   N/A|
```

[ghstack-poisoned]