Releases: thu-pacman/chitu
Releases · thu-pacman/chitu
v0.3.4
v0.3.3
v0.3.2
What's new:
[NEW] Qwen3 Models:
The following Qwen3 models are now available for use:
- Qwen3-0.6B
- Qwen3-1.7B
- Qwen3-4B
- Qwen3-8B
- Qwen3-14B
- Qwen3-30B-A3B
- Qwen3-32B
- Qwen3-235B-A22B
Usage: To use these models, append the models=Qwen3-<desired_model_size> argument when starting Chitu. For example, if you wish to use Qwen3-32B, you would use the command with models=Qwen3-32B
v0.3.1
Better support for MetaX (沐曦) GPUs:
- Support of both Llama-like models and DeepSeek models. Tested with
DeepSeek-R1-Distill-Llama-70BandDeepSeek-R1-671Busing bf16, fp16, and soft fp8 precision. - New
infer.op_impl=muxi_custom_kernelmode optimized for small batches.
v0.3.0
Added support for online conversion from FP4 to FP8 and BF16, supporting the FP4 quantized version of DeepSeek-R1 671B on non-Blackwell GPUs.
v0.2.3
v0.2.2
v0.2.1
What's new:
- [HIGHLIGHT] Hybrid CPU+GPU inference (compatible with multi-GPU and multi-request).
- Support of new models (see below for full list).
- Multiple optimizations to operator kernels.
Officially supported models:
- [NEW] QwQ-32B-FP8 (https://huggingface.co/qingcheng-ai/QWQ-32B-FP8)
Usage: Appendmodels=QwQ-32B-FP8command line argument when starting Chitu - [NEW] QwQ-32B-AWQ (https://huggingface.co/Qwen/QwQ-32B-AWQ)
Usage: Appendmodels=QwQ-32B-AWQcommand line argument when starting Chitu - [NEW] Llama-3.3-70B-Instruct (https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
Usage: Appendmodels=Llama-3.3-70B-Instructcommand line argument when starting Chitu - [NEW] DeepSeek-R1-Distill-Llama-70B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)
Usage: Appendmodels=DeepSeek-R1-Distill-Llama-70Bcommand line argument when starting Chitu - Qwen2.5-32B (https://huggingface.co/Qwen/Qwen2.5-32B)
Usage: Appendmodels=Qwen2.5-32Bcommand line argument when starting Chitu - QwQ-32B (https://huggingface.co/Qwen/QwQ-32B)
Usage: Appendmodels=QwQ-32Bcommand line argument when starting Chitu - Mixtral-8x7B-Instruct-v0.1 (https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
Usage: Appendmodels=Mixtral-8x7B-Instruct-v0.1command line argument when starting Chitu - Qwen2-72B-Instruct (https://huggingface.co/Qwen/Qwen2-72B-Instruct)
Usage: Appendmodels=Qwen2-72B-Instructcommand line argument when starting Chitu - Meta-Llama-3-8B-Instruct-original (https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct (Please use its "original" checkpoint))
Usage: Appendmodels=Meta-Llama-3-8B-Instruct-originalcommand line argument when starting Chitu - glm-4-9b-chat (https://huggingface.co/THUDM/glm-4-9b-chat)
Usage: Appendmodels=glm-4-9b-chatcommand line argument when starting Chitu - DeepSeek-R1 (https://huggingface.co/deepseek-ai/DeepSeek-R1)
Usage: Appendmodels=DeepSeek-R1command line argument when starting Chitu - DeepSeek-R1-Distill-Qwen-14B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
Usage: Appendmodels=DeepSeek-R1-Distill-Qwen-14Bcommand line argument when starting Chitu - Qwen2-7B-Instruct (https://huggingface.co/Qwen/Qwen2-7B-Instruct)
Usage: Appendmodels=Qwen2-7B-Instructcommand line argument when starting Chitu - DeepSeek-R1-bf16 (https://huggingface.co/opensourcerelease/DeepSeek-R1-bf16)
Usage: Appendmodels=DeepSeek-R1-bf16command line argument when starting Chitu - DeepSeek-V3 (https://huggingface.co/deepseek-ai/DeepSeek-V3)
Usage: Appendmodels=DeepSeek-V3command line argument when starting Chitu