Skip to content

Releases: thu-pacman/chitu

v0.3.4

29 May 16:29

Choose a tag to compare

What's new:

  • Fused MoE kernel for Qwen3 MoE models.
  • Optimizations to metadata communication in DP and PP.
  • Support of explicit configuration of micro batch size for PP by users.

v0.3.3

22 May 11:28

Choose a tag to compare

Initial support for Ascend NPU.

v0.3.2

15 May 09:45

Choose a tag to compare

What's new:

[NEW] Qwen3 Models:
The following Qwen3 models are now available for use:

  • Qwen3-0.6B
  • Qwen3-1.7B
  • Qwen3-4B
  • Qwen3-8B
  • Qwen3-14B
  • Qwen3-30B-A3B
  • Qwen3-32B
  • Qwen3-235B-A22B

Usage: To use these models, append the models=Qwen3-<desired_model_size> argument when starting Chitu. For example, if you wish to use Qwen3-32B, you would use the command with models=Qwen3-32B

v0.3.1

30 Apr 07:21

Choose a tag to compare

Better support for MetaX (沐曦) GPUs:

  • Support of both Llama-like models and DeepSeek models. Tested with DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-671B using bf16, fp16, and soft fp8 precision.
  • New infer.op_impl=muxi_custom_kernel mode optimized for small batches.

v0.3.0

29 Apr 09:59

Choose a tag to compare

Added support for online conversion from FP4 to FP8 and BF16, supporting the FP4 quantized version of DeepSeek-R1 671B on non-Blackwell GPUs.

v0.2.3

24 Apr 11:04

Choose a tag to compare

Multiple bugs fixed.

v0.2.2

22 Apr 13:48

Choose a tag to compare

Performance improvements on hybrid CPU+GPU inference.

v0.2.1

20 Apr 08:30

Choose a tag to compare

What's new:

  • [HIGHLIGHT] Hybrid CPU+GPU inference (compatible with multi-GPU and multi-request).
  • Support of new models (see below for full list).
  • Multiple optimizations to operator kernels.

Officially supported models:

v0.2.0

18 Apr 16:57

Choose a tag to compare

(This release has been yanked)

v0.1.2

01 Apr 14:01

Choose a tag to compare

HOT FIX: Fix major performance regression when CUDA graph is enabled (via infer.use_cuda_graph=True).