Add int8 static quantization workflow by jcaip · Pull Request #3442 · pytorch/ao

jcaip · 2025-12-05T00:26:31Z

This PR adds in a new static quant workflow based off of Int8Tensor (#3407).

It introduces a new config, Int8StaticActivationInt8WeightConfig which requires a scale tensor and granularity

static_config = Int8StaticActivationInt8WeightConfig(
    scale=int8_input.scale.detach(), granularity=PerRow
)
quantize_(model_static_quant, static_config)

Currently PerRow and PerTensor symmetric quant is support only.

This scale tensor is stored on the weight Int8Tensor under activation_scale, and is used to create a new activation Int8Tensor for static quantization.

It would be nice to store this scale tensor in QuantizeTensorToInt8Kwargs but unfortunately this breaks dynamo tracing, as we store the quant kwargs as an object for the weight tensor and we are unable to fakeify them properly.

As a result, we need to keep track and pass scale outside of this Kwargs object.

Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-12-05T00:26:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3442

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b5309eb with merge base c4273fe ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jcaip · 2025-12-08T22:28:16Z

+        else:
+            # Scale can be provided in the case of static quant
+            assert scale.ndim == hp_tensor.ndim
+            if isinstance(granularity, PerTensor):


note: I changed these checks in #3468

I think we typically also check the shape of scale tensor as well, like these

ao/torchao/float8/inference.py

Lines 181 to 198 in 08e5e20

def _is_rowwise_scaled(x: torch.Tensor) -> bool:

"""Checks if a quantized tensor is rowwise scaled

Args:

x: quantized tensor (should have `block_size` attribute)

"""

assert hasattr(x, "block_size"), "Expecting input to have `block_size` attribute"

return tuple(x.block_size) == (1,) * (x.dim() - 1) + (x.shape[-1],)

def _is_tensorwise_scaled(x: torch.Tensor) -> bool:

"""Checks if a quantized tensor is rowwise scaled

Args:

x: quantized tensor (should have `block_size` attribute)

"""

assert hasattr(x, "block_size"), "Expecting input to have `block_size` attribute"

return all(

x.block_size[i] == -1 or x.block_size[i] == x.shape[i] for i in range(x.ndim)

)

jerryzh168 · 2025-12-08T23:56:13Z

+        if isinstance(self.granularity, PerTensor):
+            assert self.scale.numel() == 1


nit: also check the shapes, and check PerRow as well

I think we might want to enable scales to be None, for passing Int8StaticActivationInt8WeightConfig() as a base config, we can discuss on #3468

OK makes sense, sounds good, this is easier for user I think, otherwise they have to do a separate flow to get the scale here

jerryzh168

lg, see some comments inline

* Int8Tensor migration Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: * ruff fixes * add init * fix ruff again * update * wip * undo update tests * fix ruff * fix varname * fix typing * add tests * fix dtype * fix ci * address granularity cr * update _choose_quant_func_and_quantize_tensor * make block size required attribute * made dtype required as well * address nits * skip per tensor weight only test for now * add static quant * add static quant * update * static quant working eager + compile * remove file * added asserts * undo smoothquant change * fix return * address cr feedback

This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API. You can use the new flow like follows: ```python from torchao.quantization.quant_api import ( Int8StaticActivationInt8WeightConfig, ) from torchao.prototype.smoothquant import ( SmoothQuantConfig ) config = SmoothQuantConfig( base_config=Int8StaticActivationInt8Weight(granularity=PerRow()), step=SmoothQuantStep.PREPARE, alpha=0.5, ) quantize_(model, config) # Perform calibration with test data model(*x) config.step = SmoothQuantStep.CONVERT quantize_(model, config) # model will now be statically quantized with the inputs used in smoothquant observer. model(*x) ```

* Int8Tensor migration Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: * ruff fixes * add init * fix ruff again * update * wip * undo update tests * fix ruff * fix varname * fix typing * add tests * fix dtype * fix ci * address granularity cr * update _choose_quant_func_and_quantize_tensor * make block size required attribute * made dtype required as well * address nits * skip per tensor weight only test for now * add static quant * add static quant * update * static quant working eager + compile * remove file * added asserts * undo smoothquant change * fix return * got smoothquant + int8 static working * generalized smoothquat code * free tests * fix static scale check * update * address cr feedback * Hook up static quant workflow to prototype smoothquant API This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API. You can use the new flow like follows: ```python from torchao.quantization.quant_api import ( Int8StaticActivationInt8WeightConfig, ) from torchao.prototype.smoothquant import ( SmoothQuantConfig ) config = SmoothQuantConfig( base_config=Int8StaticActivationInt8Weight(granularity=PerRow()), step=SmoothQuantStep.PREPARE, alpha=0.5, ) quantize_(model, config) # Perform calibration with test data model(*x) config.step = SmoothQuantStep.CONVERT quantize_(model, config) # model will now be statically quantized with the inputs used in smoothquant observer. model(*x) ``` * fix ruff * fix test to use threshold for sqnr

jcaip added 23 commits December 1, 2025 12:55

ruff fixes

0b73aed

add init

1e49945

fix ruff again

669b6ee

update

9071526

wip

1539e0f

Merge branch 'main' into jcaip/int8-tensor

d9a2b1b

undo update tests

673f228

fix ruff

739fd64

fix varname

750db1a

fix typing

9410488

add tests

45a3a76

fix dtype

4e2f09c

fix ci

dd80cca

address granularity cr

7f73062

update _choose_quant_func_and_quantize_tensor

ac6a2b6

make block size required attribute

f28df4a

made dtype required as well

328585e

address nits

ce4d568

skip per tensor weight only test for now

a665d45

add static quant

0338016

add static quant

ee39691

update

9eb0aa9

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 5, 2025

jcaip changed the base branch from main to jcaip/int8-tensor December 5, 2025 00:27

jcaip added 4 commits December 6, 2025 15:19

static quant working eager + compile

d4a1514

remove file

3cdea56

added asserts

fa9022d

undo smoothquant change

8ce5cde

fix return

6f64121

jcaip added the topic: new feature Use this tag if this PR adds a new feature label Dec 7, 2025

jcaip marked this pull request as ready for review December 7, 2025 00:05

jcaip changed the title ~~int8 static quant~~ Add int8 static quantization workflow Dec 7, 2025

Merge branch 'main' into jcaip/static-quant-rebased

8ae921d

jcaip changed the base branch from jcaip/int8-tensor to main December 7, 2025 00:07

jcaip closed this Dec 7, 2025

jcaip reopened this Dec 7, 2025

jcaip mentioned this pull request Dec 8, 2025

enable smoothquant for int8 static tensor #3468

Merged

jcaip commented Dec 8, 2025

View reviewed changes

jcaip requested a review from jerryzh168 December 8, 2025 22:28

jerryzh168 reviewed Dec 8, 2025

View reviewed changes

Comment thread torchao/quantization/quantize_/workflows/int8/int8_tensor.py Outdated

jerryzh168 reviewed Dec 8, 2025

View reviewed changes

Comment thread test/quantization/quantize_/workflows/int8/test_int8_tensor.py Outdated

jerryzh168 reviewed Dec 8, 2025

View reviewed changes

Comment thread torchao/quantization/quant_api.py Outdated

jerryzh168 reviewed Dec 9, 2025

View reviewed changes

Comment thread torchao/quantization/quant_api.py Outdated

jerryzh168 approved these changes Dec 9, 2025

View reviewed changes

address cr feedback

b5309eb

jcaip merged commit f99105a into main Dec 9, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add int8 static quantization workflow#3442

Add int8 static quantization workflow#3442
jcaip merged 30 commits into
mainfrom
jcaip/static-quant-rebased

jcaip commented Dec 5, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

jcaip Dec 8, 2025

Uh oh!

jerryzh168 Dec 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

jerryzh168 Dec 8, 2025

Uh oh!

jcaip Dec 9, 2025

Uh oh!

jerryzh168 Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def _is_rowwise_scaled(x: torch.Tensor) -> bool:
	"""Checks if a quantized tensor is rowwise scaled
	Args:
	x: quantized tensor (should have `block_size` attribute)
	"""
	assert hasattr(x, "block_size"), "Expecting input to have `block_size` attribute"
	return tuple(x.block_size) == (1,) * (x.dim() - 1) + (x.shape[-1],)


	def _is_tensorwise_scaled(x: torch.Tensor) -> bool:
	"""Checks if a quantized tensor is rowwise scaled
	Args:
	x: quantized tensor (should have `block_size` attribute)
	"""
	assert hasattr(x, "block_size"), "Expecting input to have `block_size` attribute"
	return all(
	x.block_size[i] == -1 or x.block_size[i] == x.shape[i] for i in range(x.ndim)
	)

		if isinstance(self.granularity, PerTensor):
		assert self.scale.numel() == 1

Conversation

jcaip commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3442

✅ No Failures

Uh oh!

jcaip Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

jcaip Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jcaip commented Dec 5, 2025 •

edited

Loading

pytorch-bot Bot commented Dec 5, 2025 •

edited

Loading

jerryzh168 Dec 8, 2025 •

edited

Loading