Add int8 static quantization workflow#3442
Conversation
Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags:
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3442
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b5309eb with merge base c4273fe ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| else: | ||
| # Scale can be provided in the case of static quant | ||
| assert scale.ndim == hp_tensor.ndim | ||
| if isinstance(granularity, PerTensor): |
There was a problem hiding this comment.
I think we typically also check the shape of scale tensor as well, like these
ao/torchao/float8/inference.py
Lines 181 to 198 in 08e5e20
| if isinstance(self.granularity, PerTensor): | ||
| assert self.scale.numel() == 1 |
There was a problem hiding this comment.
nit: also check the shapes, and check PerRow as well
There was a problem hiding this comment.
I think we might want to enable scales to be None, for passing Int8StaticActivationInt8WeightConfig() as a base config, we can discuss on #3468
There was a problem hiding this comment.
OK makes sense, sounds good, this is easier for user I think, otherwise they have to do a separate flow to get the scale here
jerryzh168
left a comment
There was a problem hiding this comment.
lg, see some comments inline
* Int8Tensor migration Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: * ruff fixes * add init * fix ruff again * update * wip * undo update tests * fix ruff * fix varname * fix typing * add tests * fix dtype * fix ci * address granularity cr * update _choose_quant_func_and_quantize_tensor * make block size required attribute * made dtype required as well * address nits * skip per tensor weight only test for now * add static quant * add static quant * update * static quant working eager + compile * remove file * added asserts * undo smoothquant change * fix return * address cr feedback
This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API. You can use the new flow like follows: ```python from torchao.quantization.quant_api import ( Int8StaticActivationInt8WeightConfig, ) from torchao.prototype.smoothquant import ( SmoothQuantConfig ) config = SmoothQuantConfig( base_config=Int8StaticActivationInt8Weight(granularity=PerRow()), step=SmoothQuantStep.PREPARE, alpha=0.5, ) quantize_(model, config) # Perform calibration with test data model(*x) config.step = SmoothQuantStep.CONVERT quantize_(model, config) # model will now be statically quantized with the inputs used in smoothquant observer. model(*x) ```
This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API. You can use the new flow like follows: ```python from torchao.quantization.quant_api import ( Int8StaticActivationInt8WeightConfig, ) from torchao.prototype.smoothquant import ( SmoothQuantConfig ) config = SmoothQuantConfig( base_config=Int8StaticActivationInt8Weight(granularity=PerRow()), step=SmoothQuantStep.PREPARE, alpha=0.5, ) quantize_(model, config) # Perform calibration with test data model(*x) config.step = SmoothQuantStep.CONVERT quantize_(model, config) # model will now be statically quantized with the inputs used in smoothquant observer. model(*x) ```
* Int8Tensor migration Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: * ruff fixes * add init * fix ruff again * update * wip * undo update tests * fix ruff * fix varname * fix typing * add tests * fix dtype * fix ci * address granularity cr * update _choose_quant_func_and_quantize_tensor * make block size required attribute * made dtype required as well * address nits * skip per tensor weight only test for now * add static quant * add static quant * update * static quant working eager + compile * remove file * added asserts * undo smoothquant change * fix return * got smoothquant + int8 static working * generalized smoothquat code * free tests * fix static scale check * update * address cr feedback * Hook up static quant workflow to prototype smoothquant API This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API. You can use the new flow like follows: ```python from torchao.quantization.quant_api import ( Int8StaticActivationInt8WeightConfig, ) from torchao.prototype.smoothquant import ( SmoothQuantConfig ) config = SmoothQuantConfig( base_config=Int8StaticActivationInt8Weight(granularity=PerRow()), step=SmoothQuantStep.PREPARE, alpha=0.5, ) quantize_(model, config) # Perform calibration with test data model(*x) config.step = SmoothQuantStep.CONVERT quantize_(model, config) # model will now be statically quantized with the inputs used in smoothquant observer. model(*x) ``` * fix ruff * fix test to use threshold for sqnr
* Int8Tensor migration Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags: * ruff fixes * add init * fix ruff again * update * wip * undo update tests * fix ruff * fix varname * fix typing * add tests * fix dtype * fix ci * address granularity cr * update _choose_quant_func_and_quantize_tensor * make block size required attribute * made dtype required as well * address nits * skip per tensor weight only test for now * add static quant * add static quant * update * static quant working eager + compile * remove file * added asserts * undo smoothquant change * fix return * got smoothquant + int8 static working * generalized smoothquat code * free tests * fix static scale check * update * address cr feedback * Hook up static quant workflow to prototype smoothquant API This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API. You can use the new flow like follows: ```python from torchao.quantization.quant_api import ( Int8StaticActivationInt8WeightConfig, ) from torchao.prototype.smoothquant import ( SmoothQuantConfig ) config = SmoothQuantConfig( base_config=Int8StaticActivationInt8Weight(granularity=PerRow()), step=SmoothQuantStep.PREPARE, alpha=0.5, ) quantize_(model, config) # Perform calibration with test data model(*x) config.step = SmoothQuantStep.CONVERT quantize_(model, config) # model will now be statically quantized with the inputs used in smoothquant observer. model(*x) ``` * fix ruff * fix test to use threshold for sqnr
This PR adds in a new static quant workflow based off of Int8Tensor (#3407).
It introduces a new config,
Int8StaticActivationInt8WeightConfigwhich requires a scale tensor and granularityCurrently PerRow and PerTensor symmetric quant is support only.
This scale tensor is stored on the weight Int8Tensor under
activation_scale, and is used to create a new activation Int8Tensor for static quantization.It would be nice to store this scale tensor in
QuantizeTensorToInt8Kwargsbut unfortunately this breaks dynamo tracing, as we store the quant kwargs as an object for the weight tensor and we are unable to fakeify them properly.As a result, we need to keep track and pass scale outside of this
Kwargsobject.