Skip to content

Could you share the code/scripts for quantify a model? #1

@ZhuJiaqi9905

Description

@ZhuJiaqi9905

It seems that the lyraW4AFP8 kernel can be used for inferencing this model (https://huggingface.co/TMElyralab/DeepSeek-R1-AWQ-W4AFP8). However, for other models, how can we get quantized model weights?
If you could share the code used for model quantization, it would be a great help to the open-source community (e.g. sglang).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions