It seems that the lyraW4AFP8 kernel can be used for inferencing this model (https://huggingface.co/TMElyralab/DeepSeek-R1-AWQ-W4AFP8). However, for other models, how can we get quantized model weights?
If you could share the code used for model quantization, it would be a great help to the open-source community (e.g. sglang).
It seems that the lyraW4AFP8 kernel can be used for inferencing this model (https://huggingface.co/TMElyralab/DeepSeek-R1-AWQ-W4AFP8). However, for other models, how can we get quantized model weights?
If you could share the code used for model quantization, it would be a great help to the open-source community (e.g. sglang).