Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.MD

AWQ quantization

AWQ is a PTQ(Post-Training Quantization) method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs.

Orion models can be quantized using AWQ easily. Follow step-by-step tutorial below.

To run AWQ, we will use AutoAWQ.

Do Quantization

The quant.py script is provided for you to perform AWQ quantization:

python quant.py --model_path /base_model \
    --save_path /quantized_model --group_size 128 --version "gemm"

Run Quantized Model

You can run a quantized model using the eval_quantized_model.py:

python eval_quant.py --model /quantized_model --trust_remote_code