-
Notifications
You must be signed in to change notification settings - Fork 344
Update GemLite to support vLLM V1 #2199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2199
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 30679a1 with merge base 66eb801 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
there are some errors: |
@jerryzh168 yes sorry I missed that one, should be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, could you test in vllm as well to get a sense of speedup? you can test some 8B model, and compare to baseline with benchmark_latency I think, like this: https://huggingface.co/pytorch/Phi-4-mini-instruct-int4wo-hqq#benchmark_latency
I think we need to re-export the models, I changed the The thing is, I am working on a different branch right now which would also change |
* update to forward_functional() * add 8-bit symmetric case * ruff * fix test
Updated GemLite changes to make it compatible with vLLM V1.
I also corrected the unpacking which should use the output feature size and added symmetric A16W8 support since the arguments support it but it was not implemented.