fix(Int8DynActInt4WeightQuantizer): use scales_precision for runtime modules#4459
fix(Int8DynActInt4WeightQuantizer): use scales_precision for runtime modules#4459Anai-Guo wants to merge 2 commits into
Conversation
…modules _convert_for_runtime passed self.precision as the scales_precision argument to replace_linear_8da4w, so the runtime Int8DynActInt4WeightLinear modules were created with the wrong scales dtype. This diverged from _create_quantized_state_dict, which uses self.scales_precision, producing an inconsistency between quantize(model).state_dict() and _create_quantized_state_dict(model). Pass self.scales_precision instead. Fixes pytorch#2571
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4459
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 2 Unclassified FailuresAs of commit de16b05 with merge base 28e6aca ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Asserts Int8DynActInt4WeightQuantizer with a scales_precision distinct from precision builds runtime Int8DynActInt4WeightLinear modules whose scales/zeros buffers use scales_precision. Fails on the pre-fix behavior, which passed self.precision as the scales_precision argument of replace_linear_8da4w. Signed-off-by: Tai An <antai12232931@outlook.com>
|
@claude review |
|
Claude finished @andrewor14's task in 56s —— View job Review:
|
Summary
Int8DynActInt4WeightQuantizer._convert_for_runtimepassedself.precisionas thescales_precisionargument ofreplace_linear_8da4w, so the runtimeInt8DynActInt4WeightLinearmodules were built with the wrong scales dtype. The code even carried a# TODO: this should be self.scales_precision?next to the line.This diverges from
_create_quantized_state_dict, which usesself.scales_precisionfor the scales. As reported in #2571, that mismatch makesquantize(model).state_dict()inconsistent with_create_quantized_state_dict(model)wheneverprecision != scales_precision.Fix
Pass
self.scales_precision(instead ofself.precision) as thescales_precisionargument, matching_create_quantized_state_dict.Fixes #2571
Test plan
Int8DynActInt4WeightQuantizer(precision=torch.float32, scales_precision=torch.bfloat16), quantize a small linear model, and confirm the runtime module scales dtype now matches_create_quantized_state_dictoutput.🤖 Generated with Claude Code