Releases: thu-pacman/chitu
v0.5.0
针对集群部署性能的多项改进:
- 更好的 DP+TP+EP 混合并行支持。
- MoE 负载均衡策略。
- 针对预处理和后处理的性能优化。
- 多处问题修复。
Multiple improvements on cluster deployments:
- Better support on hybird DP+TP+EP parallelism.
- Load balancing strategy for MoE.
- Optimizations on pre-processing and post-processing.
- Multiple bug fixes.
Official Docker images / 官方 docker 镜像:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.5.0
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.5.0
- Ascend A2: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.5.0
- Ascend A3: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend-a3:v0.5.0
v0.4.3
Fixed some performance issues.
修复了一些性能问题。
Official Docker images / 官方 docker 镜像:
NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.4.3
Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.4.3
Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.4.3
v0.4.2
-
Added supports to some new models.
- Seed-OSS-36B-Instruct
- DeepSeek-V3.1
- See SUPPORTED_MODELS for details
-
Performance Optimization
- Support Chunked Prefill
- Support using DeepEP to optimize EP communication
- requires extra installation of nvshmem (see installation guide)
- CUDA Graph can be enabled when using DeepEP
-
Fixed some bugs
-
新增模型支持
- Seed-OSS-36B-Instruct
- DeepSeek-V3.1
- 详见 SUPPORTED_MODELS
-
性能优化
- 支持 Chunked Prefill
- 支持利用 DeepEP 优化 EP 通信
- 需要额外安装 nvshmem(参考官方安装说明)
- 利用 DeepEP 时可开启 CUDA graph
-
修复若干缺陷
Official Docker images / 官方 docker 镜像:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.4.2
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.4.2
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.4.2
v0.4.1
- Supported Expert Parallelism (EP). Enable it by setting
infer.ep_size(which currently should be equal toinfer.tp_size, parallelizing the attention part with TP in the same degree of parallelism). - Supported PD-disaggregated inference (requiring additional dependencies, currently, please build it manually based on the
Dockerfilefollowing the mooncake configuration guideline). - Supported hardware fp4 computation on NVIDIA Blackwell GPUs (requiring additional dependencies, available when building from
blackwell.Dockerfile). - Added supports to some new models. See chitu/docs/en/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu for details.
- Fixed multiple bugs.
- 支持专家并行(EP),设置
infer.ep_size使用(目前需要与infer.tp_size相等,表示 attention 部分以相同的并行度进行 TP 并行)。 - 支持 PD 分离(需要额外依赖,当前请基于赤兔基础镜像,参考 mooncake 配置指南手动构建)。
- 支持在 NVIDIA Blackwell GPU 上进行硬件 fp4 计算(需要额外依赖,建议通过
blackwell.Dockerfile构建镜像)。 - 新增部分模型支持,详见 chitu/docs/zh/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu。
- 修复若干缺陷。
Official Docker images / 官方 docker 镜像:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.4.1
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.4.1
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.4.1
v0.4.0
v0.4.0 marks a significant improvement over v0.3.x on performance and availability. We recommand all medium-sized (about 1-4 servers) deployments upgrading to this version.
Highlighted changes:
- Optimizations for platforms including NVIDIA GPUs, Ascend NPUs, MetaX GPUs, and Hygon DCUs.
- Optimizations for models including DeepSeek-R1, Qwen3-32B, Kimi K2, GLM-4.5.
Official Docker images:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.4.0
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.4.0
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.4.0
v0.3.9
New supported models on multiple platforms including NVIDIA GPUs and Ascend NPUs:
- GLM-4.5: To use, append
models=GLM-4.5andmodels.ckpt_dir=/your/local/model/pathcommand line argument when starting Chitu. - GLM-4.5-Air: To use, append
models=GLM-4.5-Airandmodels.ckpt_dir=/your/local/model/pathcommand line argument when starting Chitu. - Kimi-K2-Instruct: To use, append
models=Kimi-K2-Instructandmodels.ckpt_dir=/your/local/model/pathcommand line argument when starting Chitu.
Official Docker images:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:latest
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:latest
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:latest
v0.3.8
What's new:
- Performance has been further optimized.
- DeepSeek models quantized with mixed precision DeepSeek-R1-mix are now released.
- FP4 models are now compatible with Ascend 910B2.
Official Docker images:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:latest
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:latest
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:latest
v0.3.7
What's new:
- Initial support for Hygon DCU.
- Optimized post-processing performance.
- Added launch argument validation.
Official Docker images:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:latest
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:latest
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:latest
v0.3.6
What's new:
- Support recent GLM models
- Automatic settings for KV Cache manager and task scheduler
- Fixed some known issues.
Official Docker images:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.3.6
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.3.6
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.3.6
v0.3.5
What's new:
- Support for Ascend NPU aclgraph to enhance higher performance.
- Fixed some known issues.
- We provide official Docker images:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:0.3.5
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:0.3.5
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:0.3.5