Update GemLite to support vLLM V1 #2199

mobicham · 2025-05-12T12:31:05Z

Updated GemLite changes to make it compatible with vLLM V1.
I also corrected the unpacking which should use the output feature size and added symmetric A16W8 support since the arguments support it but it was not implemented.

#pip install git+https://github.com/mobiusml/gemlite/ --upgrade
from transformers import TorchAoConfig
from torchao.quantization import GemliteUIntXWeightOnlyConfig
quant_config = TorchAoConfig(quant_type=GemliteUIntXWeightOnlyConfig(bit_width=4, group_size=128)) #A16W4
#quant_config = TorchAoConfig(quant_type=GemliteUIntXWeightOnlyConfig(bit_width=8, group_size=None)) #A16W8
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    attn_implementation="sdpa",
    device_map="cuda",
    quantization_config=quant_config,
)

pytorch-bot · 2025-05-12T12:31:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2199

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[PREEMPTIVE] Removal of ephemeral variants on scale-config.yml

✅ No Failures

As of commit 30679a1 with merge base 66eb801 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-05-12T21:53:08Z

there are some errors: RuntimeError: Command docker exec -t bc8b6eb3f9f983a73e1ff5d5a73edd71676f13cb84d9435da52f59231e4631f1 /exec failed with exit code 2 2025-05-12T12:34:46.7879759Z GemliteUIntXWeightOnlyConfig( 2025-05-12T12:34:46.7880371Z E TypeError: __init__() got an unexpected keyword argument 'packing_bitwidth' 2025-05-12T12:34:46.7881071Z =========================== short test summary info ============================ 2025-05-12T12:34:46.7882046Z ERROR test/quantization/test_config_serialization.py - TypeError: __init__() got an unexpected keyword argument 'packing_bitwidth' 2025-05-12T12:34:46.7882899Z !!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!

mobicham · 2025-05-13T07:26:59Z

@jerryzh168 yes sorry I missed that one, should be fixed now.

jerryzh168

looks good, could you test in vllm as well to get a sense of speedup? you can test some 8B model, and compare to baseline with benchmark_latency I think, like this: https://huggingface.co/pytorch/Phi-4-mini-instruct-int4wo-hqq#benchmark_latency

mobicham · 2025-05-13T20:54:28Z

I think we need to re-export the models, I changed the meta_args argument. But I can test with a locally quantized ao models.

The thing is, I am working on a different branch right now which would also change meta_args, so we can re-export the models after that.

* update to forward_functional() * add 8-bit symmetric case * ruff * fix test

mobicham added 3 commits May 12, 2025 12:05

update to forward_functional()

1982e3a

add 8-bit symmetric case

3ade6c0

ruff

e808405

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2025

fix test

30679a1

jerryzh168 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label May 13, 2025

jerryzh168 approved these changes May 13, 2025

View reviewed changes

jerryzh168 merged commit 8e33b70 into pytorch:main May 21, 2025
18 of 19 checks passed

mobicham deleted the vllm_v1_update branch June 2, 2025 10:06

liangel-02 pushed a commit that referenced this pull request Aug 25, 2025

Update GemLite to support vLLM V1 (#2199)

a3aab85

* update to forward_functional() * add 8-bit symmetric case * ruff * fix test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update GemLite to support vLLM V1 #2199

Update GemLite to support vLLM V1 #2199

Uh oh!

mobicham commented May 12, 2025

Uh oh!

pytorch-bot bot commented May 12, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented May 12, 2025 •

edited

Loading

Uh oh!

mobicham commented May 13, 2025

Uh oh!

jerryzh168 left a comment

Uh oh!

mobicham commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

Update GemLite to support vLLM V1 #2199

Update GemLite to support vLLM V1 #2199

Uh oh!

Conversation

mobicham commented May 12, 2025

Uh oh!

pytorch-bot bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2199

❗ 1 Active SEVs

✅ No Failures

Uh oh!

jerryzh168 commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mobicham commented May 13, 2025

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

mobicham commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented May 12, 2025 •

edited

Loading

jerryzh168 commented May 12, 2025 •

edited

Loading