Enable dispatch to tinygemm int4 and int8 kernels for quantized tensor#230
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/230
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 2a8dc5d with merge base e7bbbd2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
waiting for #227 to be landed to actually test it |
| None if len(args) == 2 else args[2], | ||
| ) | ||
| if weight_qtensor.input_quant_func is not None: | ||
| if weight_qtensor.input_quant_func is None: |
There was a problem hiding this comment.
Let's please remove input_quant_func and discuss using AffineQuantizedTensor as an organizing principle :)
There was a problem hiding this comment.
this is used for testing 8da4w right now, can this be done later until there is a better alternative?
cpuhrsch
left a comment
There was a problem hiding this comment.
Setting request changes just so we don't forget to remove input_quant_func
will do in next PR |
|
|
||
|
|
||
| # TODO: add padding support | ||
| class TinygemmAffineQuantizedTensor(AffineQuantizedTensor): |
cpuhrsch
left a comment
There was a problem hiding this comment.
CI is red and let's please not add a TinygemmAffineQuantizedTensor
|
sorry, CI should be fixed now. For tinygemm, it's because it is quantizing things differently, we need to think of how to unify this |
b5950a3 to
449a5d8
Compare
61befaa to
943bf13
Compare
…ed tensor Summary: This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation mismatch problem for tinygemm first Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int4 python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8 Reviewers: Subscribers: Tasks: Tags:
| _replace_with_custom_fn_if_matches_filter( | ||
| model, | ||
| lambda linear_mod: dynamic_quant(linear_mod, (torch.randn(1, linear_mod.in_features))), | ||
| lambda linear_mod: dynamic_quant(linear_mod, (torch.randn(1, linear_mod.in_features),)), |
There was a problem hiding this comment.
Interesting. Why is that extra comma needed now?
There was a problem hiding this comment.
By all means don't be blocked on this comment haha
|
|
||
| self.assertTrue(torch.equal(scale, scale_ref)) | ||
| torch.testing.assert_close(zero_point_float, zero_point_ref, rtol=0.00001, atol=torch.max(scale)*0.03) | ||
| self.assertTrue(torch.equal(zero_point, zero_point_ref)) |
pytorch#230) Summary: This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation mismatch problem for tinygemm first Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int4 python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8 Reviewers: Subscribers: Tasks: Tags:
* clean up gguf loading. Move model loading to meta. * remove cpu * Fix CI and validation scripts (pytorch#154) * missing device (pytorch#232) * Use generator args to group all arguments to generator (pytorch#231) * prompt * chat_mode, num_samples * Move more generator args to use dataclass (pytorch#233) * prompt * chat_mode, num_samples * move more args * more gen args * update * args * undo some changes * typos * Minor lint fixes (pytorch#236) * remove redundancy & remove int4 linear test from ET tests (pytorch#237) * remove redundancy * no int4 linear on ET * small changes --------- Co-authored-by: Guang Yang <42389959+guangy10@users.noreply.github.com> Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com> Co-authored-by: Mergen Nachin <mnachin@meta.com>
Summary:
This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation mismatch problem for tinygemm first
Test Plan:
TODO
Reviewers:
Subscribers:
Tasks:
Tags: