Releases: THUDM/slime
v0.2.4
v0.2.4 is here! Thanks to everyone who contributed to this release.
Major Updates
In addition to a broad set of bug fixes and stability improvements, v0.2.4 brings several major updates:
- Profiling and observability improvements
Added a rollout trace timeline viewer and W&B reporting for dynamic ITL / TTFT percentile metrics. - Router stack unified on sgl-router
Consolidated the router stack onto sgl-router and removed slime-router. - Expanded multimodal and model support
Improved support for GLM-4.6V / GLM4V, Multimodal OPD, and Qwen3.5-related workflows.
Other Notable Changes
- Fixed CUDA IPC cache leaks during weight updates
- Fixed SP/CP gradient inflation in FLA layers
What's Changed
- feat: add GLM-4.6V MoE VL bridge with CP support by @zhuzilin in #1715
- fix: resolve rope_theta from rope_parameters dict in HF config validation by @zhuzilin in #1720
- [docker] patches for glm4.6v, kimi k2.5 and dsa cp only by @zhuzilin in #1722
- Fix CUDA IPC cache leaks during weight updates by @zhuzilin in #1731
- [docker] update megatron by @zhuzilin in #1729
- [docker] Fix IndexCache with mla model by @zhuzilin in #1736
- [slime-router] support pd disaggregation and remove radix tree middleware by @zhuzilin in #1735
- Fix glm4v megatron bridge by @zhuzilin in #1738
- [docker] update sglang patch by @zhuzilin in #1743
- feat: GLM4V multimodal support improvements by @zhuzilin in #1745
- feat: placeholder worker type, metrics router, and GPQA letter range by @zhuzilin in #1746
- always enable_metrics and remove dp context by @zhuzilin in #1747
- fix: resolve SP/CP gradient inflation in FLA (linear attention) layers by @zhuzilin in #1748
- Update MTP example configs, rename GLM-4.5 to GLM-4.7, clean scripts by @zhuzilin in #1749
- Support qwen3.5 loss mask for multi-turn SFT by @huang3eng in #1742
- fix: propagate moe_token_dispatcher_type in bridge model provider by @nanjiangwill in #1737
- fix: resolve rope_theta from rope_parameters in DeepseekV32Bridge by @stevewx in #1734
- chore: translate remaining Chinese comments to English by @WangHong-yang in #1726
- feat: add Qwen3.5-4B model support by @shihaohou in #1721
- fix: http_utils. disable system proxy for internal SGLang httpx clients by @DongzhuoranZhou in #1714
- fix: auto-detect GPUs in qwen3-4b script by @ailuntz in #1700
- fix: quote
$MOE_LAYER_FREQby @lawrence-harmonic in #1689 - disable router health_check and allow prompt_data is None by @zhuzilin in #1751
- small fix on qwen3-235b-a22b launch script by @Zhuohao-Li in #1719
- sync internal bugfix by @zhuzilin in #1765
- Fix uploading sglang metrics to wandb by @zhuzilin in #1768
- use zhuzilin/sgl-router for sglang-router by @zhuzilin in #1770
- [docker] update sgl-router by @zhuzilin in #1772
- [Multimodal] Add Multimodal OPD support by @coding-famer in #1760
- refactor: remove slime router by @zhuzilin in #1773
- Add rollout trace timeline viewer by @zhuzilin in #1776
- [Fix] Fix duplicate Megatron LR scheduler resume when optimizer state is not loaded by @kaysonyu in #1775
- Support FP8 conversion for Qwen3.5 by @peterjc123 in #1769
- fix typo by @albaNnaksqr in #1759
- [Fix]Fix some bugs/clean up by @coding-famer in #1756
- (fix):not have encoder_only attr cause run failed by @wangyufak in #1741
New Contributors
- @stevewx made their first contribution in #1734
- @WangHong-yang made their first contribution in #1726
- @shihaohou made their first contribution in #1721
- @DongzhuoranZhou made their first contribution in #1714
- @ailuntz made their first contribution in #1700
- @peterjc123 made their first contribution in #1769
- @albaNnaksqr made their first contribution in #1759
- @wangyufak made their first contribution in #1741
Full Changelog: v0.2.3...v0.2.4
v0.2.3
v0.2.3 is here! Thanks to everyone who contributed to this release.
Major Updates
In addition to a broad set of bug fixes and stability improvements, v0.2.3 brings several major updates:
- YAML-based sglang_config support for engine group configuration
This makes rollout setup much more flexible: you can now configure different parallelism strategies for PD disaggregation, enable EPD-style deployments, or even serve multiple heterogeneous models within one rollout setup more cleanly. - Expanded model support, including GLM5, GLM-4.7-Flash and Qwen3.5.
- Dependency and runtime updates, including SGLang v0.5.9 docker support and multiple fixes for PD, NSA, HiCache, CP+PP, etc.
Other Notable Changes
- Added consistent hashing routing for multi-turn rollout
- Removed FSDP support so we can focus maintenance effort on the training and rollout paths we actively invest in
What's Changed
- fix: fix sglang regression by @nanjiangwill in #1363
- [docker] upgrade fla to 0.4.1 by @zhuzilin in #1452
- Allow passing pp_size by @zhuzilin in #1454
- Revert "fix: fix sglang regression" by @zhuzilin in #1457
- update #1457 by @nanjiangwill in #1458
- [model] Add support for GLM4.7 Flash by @zhuzilin in #1460
- [script] Add example script for GLM4.7 Flash by @zhuzilin in #1467
- add lb default by @lilei199908 in #1465
- support non-symmetric int4 qat by @zhuzilin in #1472
- add convert hf to int4 without calibration dataset by @xieck13 in #1489
- [docker] fix nsa + hicache by @zhuzilin in #1494
- fix compute logprobs menmory leak bug by @lilei199908 in #1506
- [Fix] support converting torch_dist to hf for qwen3vl dense model by @p1k0pan in #1491
- renamed qwen3-vl.py to qwen3_vl.py to fix typo by @gxlvera in #1512
- sync internal features by @zhuzilin in #1513
- [docker] allow retract to 0 req during PD by @zhuzilin in #1515
- Pass through correct Megatron model provider PP args by @hari-hm in #1486
- fix: support mtp for qwen3-next by @huang3eng in #1503
- [Doc] Add doc for slime router by @Hecate0821 in #1499
- Add support for vlm checkpoints conversion by @cklxx in #1475
- fix: use aread() to fully consume HTTP response body by @ann-qin-lu in #1488
- Add response.aclose() and fix lint by @zhuzilin in #1520
- [docker] alleviate pd memory leakage by @zhuzilin in #1525
- Update convert to INT4 script path by @gxlvera in #1528
- [docker] add alloc_extend_torch_fallback for long context by @zhuzilin in #1530
- [docker] remove seq_len in _get_k_and_s_triton_kernel by @zhuzilin in #1531
- [Feature] Add megatron version for on policy distillation by @yitianlian in #1538
- Make
DataSourceimplement__len__to standardize the data source contract by @TSunny007 in #1518 - [Fix] Add fsdp assert for OPD by @yitianlian in #1545
- [bug] fix: gracefully handle datapoints with no multimodal input in a multimodal dataset by @hleehlee-amazon in #1535
- fix sglang hicache nsa bugs by @lilei199908 in #1549
- [docker] fix nsa retract by @zhuzilin in #1566
- [Multimodal] make multimodal processing robust by @coding-famer in #1516
- [Fix] Fix multimodal_train_inputs handling for mixed text-multimodal datasets by @coding-famer in #1559
- [Feature] add convert_torch_dist_to_hf_bridge.py by @coding-famer in #1573
- [Fix] Fix resuming from Megatron checkpoint when using bridge by @coding-famer in #1577
- [Feaure] Add Profile Config by @yitianlian in #1561
- [Feature] Add consistent hashing routing policy for rollout by @yitianlian in #1588
- [Fix] Minor fix for support distribute mode by @yitianlian in #1589
- [bug] fix: fix type hint in
Sampleclass by @hleehlee-amazon in #1551 - Add support for GLM5 by @zhuzilin in #1599
- fix(examples): update strands_sglang example to strands-sglang v0.2.x API by @Lawhy in #1593
- refactor: separate sglang argparse from megatron with two-phase parsing by @zhuzilin in #1600
- Add issue templates by @zhuzilin in #1602
- [cleanup] remove metric checker and long tests by @zhuzilin in #1603
- fix: use getattr for sglang params when rollout_only is disabled by @zhuzilin in #1604
- refactor: extract RolloutServerGroup and start_rollout_server by @zhuzilin in #1605
- refactor: delegate engine state and operations to RolloutServerGroup by @zhuzilin in #1606
- fix: sync docs with current implementation by @zhuzilin in #1608
- fix: restrict SymPy parser scope to prevent arbitrary code execution by @Hardik-369 in #1587
- refactor: split PD disaggregation into separate EngineGroups by @zhuzilin in #1609
- feat: move OPD to slime/rollout, add CI test and docs by @zhuzilin in #1610
- docs: add CI section to developer guide (EN + ZH) by @zhuzilin in #1612
- refactor: make EngineGroup ops non-blocking and batch ray.get at RolloutServer level by @zhuzilin in #1613
- Fix: disable allow_abbrev in _pre_parse_mode to prevent --load collision by @coding-famer in #1616
- fix: skip memory check for non-communication query functions in ReloadableProcessGroup by @zhuzilin in #1625
- Fix: Handle quantization formats during weight synchronization in Megatron bridge. by @GeLee-Q in #1624
- Add retries, which are not built into aiohttp by @joyliu-q in #1617
- [docker] upgrade to v0.5.9 by @zhuzilin in #1626
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #1590
- add no colocate update critic only by @lilei199908 in #1567
- remove redundant log_rollout_data by @zhuzilin in #1629
- feat: add --sglang-config YAML for engine group configuration by @zhuzilin in #1614
- Fix #1595: pass rollout_id explicitly to offload_train by @yitianlian in #1631
- Fix #1615: tolerate abort_request connection failures by @yitianlian in #1632
- [docker] support pp + cp for dsa model by @zhuzilin in #1634
- [docker] fix sglang upgrade bug by @zhuzilin in #1639
- Add Qwen3.5 model support (27B dense and 35B-A3B MoE) by @zhuzilin in #1641
- [docker] fix int4 qat for upgraded sglang by @zhuzilin in #1642
- Add GLM-4.7-Flash example docs and 8xH100 training script by @zhuzilin in #1645
- [docker] supports bf16 deepep by @zhuzilin in #1651
- Add slime skills for rollout, reward, filter, eval config, and CI by @yitianlian in #1646
- [Feature] Add plugin contract test suite by @yitianlian in #1652
- support gpt-oss by @zhuzilin in #1658
- [docker] bugfixes on cp + pp by @zhuzilin in #1659
- [docker] remove true on policy patches by @zhuzilin in #1661
- [fix]: Qwen3.5-35B-A3B 8-GPU: set TP size to 2 for num_query_groups=2 by @none0663 in #1662
- Remove FSDP support by @zhuzilin in #1664
- docs: add OpenClaw-RL to projects built upon slime by @yinjjiew in #1635
- Support setting update weights in sglang_config by @zhuzilin in #1665
- [fix] Fix numerical accuracy issue in dynamic sampling filter by @Django-Jiang in #1674
- sync from internal by @zhuzilin in #1677
- bugfixes from community by @zhuzilin in #1678
- Fix: pass return_tensors in text_kwargs for transformers>=5.0.0 compatibility by @coding-famer in #1648
- Fix missing packed_seq_params in bshd qkv...
v0.2.2
v0.2.2 is here! Thanks to everyone who contributed to this release.
Major Updates
In addition to multiple memory and performance improvements, v0.2.2 adds support for:
- Int4-QAT training
- Full R3 (Rollout Routing Replay) support with DeepEP and MTP
- Dependency upgrades: SGLang v0.5.7 and the Megatron dev branch
What's Changed
- add ckpt load save ci by @lilei199908 in #1104
- Add --rollout-all-samples-process-path for RLVE by @zhuzilin in #1107
- feat: support Qwen3 Moe BackEnd Kernel by @attack204 in #1071
- fix max response/context/prompt len by @lilei199908 in #1110
- fix max len by @lilei199908 in #1112
- [docker] remove amem and support deepep + r3 by @zhuzilin in #1115
- [Fix] Fix early return in init rollout engine by @yitianlian in #1118
- [Fix] Add sglang patch for weight version update by @yitianlian in #1119
- fix: improve tokenization by @nanjiangwill in #1113
- [Feature] Add CI test for weight version update by @yitianlian in #1120
- [docker] optimize r3 with base64 encode by @zhuzilin in #1124
- [docker] fix r3 gather buffer by @zhuzilin in #1129
- [docker] support mtp for r3 by @zhuzilin in #1131
- [Fix] Fix some bugs in retool example by @yitianlian in #1130
- Add finalize_model_grads_with_empty_cache by @zhuzilin in #1133
- Feat: add usage docs for fsdp by @lin0303-siyuan in #1092
- Reserve more ports for new sglang dp attn impl by @zhuzilin in #1142
- Blog: fix the path of the Blog's architecture image by @ShanningZhuang in #1125
- Support async save and add extra save at the end of the training by @zhuzilin in #1143
- fix: fix GemmeRMSNorm.forward() bug by @nanjiangwill in #1121
- [WIP][FSDP] Support FSDP for Qwen3Next by @rucnyz in #1116
- Megatron VLM Support (1/N) by @Zhuohao-Li in #1123
- Update deprecated huggingface-cli and fix broken links by @Lyken17 in #1147
- Added FSDP checkpoint handling to convert_torch_dist_to_hf.py by @cklxx in #1101
- minor fix for megatron compatibility by @zhuzilin in #1149
- Remove config_mapping to use megatron-bridge by @zhuzilin in #1166
- Avoids repeated work. by @qqwqqw689 in #1163
- Make tools/convert_torch_dist_to_hf.py not rely on megatron by @zhuzilin in #1167
- support converting dpsk mtp layer by @zhuzilin in #1169
- [FSDP] Add Masked importance sampling by @zijiexia in #1122
- [TIS/MIS] fix and add better metric by @ChangyiYang in #1174
- Fix optimizer schedule resume by @lr-tsinghua11 in #1152
- [docker] upgrade to megatron dev branch by @zhuzilin in #1153
- Minor fix by @lancerts in #1165
- Fix forward of Qwen3VLTextRotaryEmbedding in Megatron-Bridge by @zhuzilin in #1179
- Reuse the text llm config for qwen3 vl models by @zhuzilin in #1180
- Don't save AutoBridge in args by @zhuzilin in #1181
- [Fix] Fix port error in PD disaggregation setting by @yitianlian in #1175
- Fix prompt type bug in generate_with_search within examples/search-r1 by @jiahe7ay in #1182
- feat: support Qwen3 VL MoE by @nanjiangwill in #1171
- [Fix] Minor fix by @yitianlian in #1183
- Set parallel config for megatron bridge by @zhuzilin in #1184
- Fix tools/convert_hf_to_torch_dist.py by @zhuzilin in #1186
- Don't calculate entropy grad when coef is 0 by @zhuzilin in #1185
- Disable routing replay for critic by @zhuzilin in #1187
- Revert "Don't calculate entropy grad when coef is 0" by @zhuzilin in #1189
- Fix qwen3next for megatron dev branch by @zhuzilin in #1190
- fix: fix logging for rollout by @nanjiangwill in #1188
- sync internal features by @zhuzilin in #1192
- Fix check_weights api by @zhuzilin in #1194
- Add --custom-rollout-log-function-path and --custom-eval-rollout-log-function-path by @zhuzilin in #1196
- [Feature] Add more logging for health monitor by @yitianlian in #1195
- fix: SFT tools support by @maoquan-ms in #1198
- [Featuren] Change default value of rollout health check by @yitianlian in #1197
- Megatron VLM Support w/ SFT (2/N) by @Zhuohao-Li in #1150
- tiny fix for sft script after tokenizer improvement by @Zhuohao-Li in #1201
- tests: add test for multi turn loss mask by @maoquan-ms in #1204
- Always pass loss masks to model by @zhuzilin in #1205
- [on-policy distillation] update reward function to fix potential token mismatches by @ahxt in #1128
- Add ci for mtp by @zhuzilin in #1207
- Fix mla tflops by @lilei199908 in #1209
- update docs by @zhuzilin in #1211
- update docs by @zhuzilin in #1214
- [Feature] Support 0.3.0 sglang router for fault tolerance by @yitianlian in #1215
- sync internal features by @zhuzilin in #1216
- feat: add custom logic for processing list[list[Sample]] to training data by @nanjiangwill in #1218
- add int4_quant cuda kernel by @Hyaloid in #1220
- update doc by @zhuzilin in #1224
- Improve AMD tutorial with complete model/data setup workflow by @Vivicai1005 in #1212
- update megatron patch by @zhuzilin in #1228
- sync from internal by @zhuzilin in #1229
- fix model saving bug in megatron by @zhuzilin in #1230
- add new status by @nanjiangwill in #1219
- update customization docs by @nanjiangwill in #1233
- Revert data processing of VLM by @zhuzilin in #1232
- [VLM] optimize VLM processing by @nanjiangwill in #1234
- feat: add custom pg_loss reducer by @ChangyiYang in #1235
- fix: typo "sgalng" → "sglang" in ROCm Dockerfiles by @yurekami in #1282
- sync bugfix from internal by @zhuzilin in #1284
- sync internal bugfix by @zhuzilin in #1286
- add bshd support by @yueming-yuan in #1285
- [docker] fix bugs on pd disaggregation and add --disable-draft-cuda-graph by @zhuzilin in #1288
- Add longest_effective_sample_tokens_per_sec metric by @zhuzilin in #1291
- [fix] conditionally pass kwargs for megatron-bridge VLM by @yueming-yuan in #1290
- [VLM] Bugfix: image_patch_size for vision preprocessing by @coding-famer in #1227
- feat: add --custom-model-provider-path argument by @yurekami in #1239
- [Feature/Fix] Support IPv6 host resolution and robust URI formatting by @Chen-GX in #859
- Fix missing trust_remote_code in HfWeightIteratorBridge by @SwordFaith in #1287
- fix: remove invalid None default and fix misleading underscore variable naming by @lancerts in #1283
- fix: remove duplicate Megatron-LM installation in build_conda.sh by @yurekami in #1238
- fix dev megatron ckpt save bugs by @lilei199908 in #1294
- [Fix] fix image_patch_size in processing_utils by @coding-famer in #1295
- support hicache for pd disaggregation by @zhuzilin in #1296
- Optimize data.py for efficient data loading by @ppraneth in #696
- Auto Sync Code by @miles-code-angel in #1303
- [VLM] end2end geo3k multi-turn RL of VLM Recipe by @gxlvera in https://github.com/THUD...
v0.2.1
Thanks to the incredible support and contributions from our community — v0.2.1 is here!
Major Updates
- VLM + FSDP: true on-policy training on Qwen3-VL (dense).
- PD-disaggregation support during rollout
- DP-attention support in rollout routing replay (R3)
- Upgraded to SGLang v0.5.6
What's Changed
- extract mla update weight logic out by @zhuzilin in #960
- support do all evals together by @zhuzilin in #959
- Add --rollout-sample-filter-path by @zhuzilin in #961
- [FSDP] Optimize FSDP2 Model Loading with Rank-0 Broadcast by @Hecate0821 in #915
- Add sample.remove_sample by @zhuzilin in #977
- add --eval-max-prompt-len by @zhuzilin in #978
- Add args check for max_context_len by @zhuzilin in #979
- Remove hard coded balance_abs_threshold by @zhuzilin in #981
- Tiny fix fp8_cast_bf16 not copying chat template by @fzyzcjy in #964
- Super tiny install dnsutils in dockerfile by @fzyzcjy in #965
- Super tiny sanity check checkpoint dir by @fzyzcjy in #966
- Fix convert_hf_to_torch_dist OOM by @fzyzcjy in #967
- Tiny support using environment variables in addition to arguments for all scripts by @fzyzcjy in #968
- Super tiny increase default timeout sec by @fzyzcjy in #969
- Fix random port in use error even though already have free port detection by @fzyzcjy in #970
- Super tiny enable draft-weights-cpu-backup to avoid MTP acc len issue by @fzyzcjy in #971
- Add generation function for benchmarking purpose by @fzyzcjy in #972
- Support zero host or device memory waste for weight update by @fzyzcjy in #973
- Add fp8 kv cache and tis in qwen3 30b a3b script by @fzyzcjy in #974
- Add GB200, MTP, benchmark, fp8 rollout mode to glm script by @fzyzcjy in #975
- [FSDP] Add private func indicator for better usage by @PopSoda2002 in #982
- [Bugfix] Rename save model by @PopSoda2002 in #983
- Fix: resolve variable shadowing bug in setup_model_and_optimizer by @fangzhensheng in #963
- remove unnecessary optimizer init by @zhuzilin in #984
- [release] bump to v0.2.0.post1 by @zhuzilin in #986
- fix scaling of per token loss by @zhuzilin in #987
- Add strands-agents example by @Lawhy in #976
- Add nemo skills evaluation by @guapisolo in #989
- [1/N] Tiny execute Ruff auto lint by @fzyzcjy in #991
- [2/N] Tiny manually fix for Ruff default ruleset and add to pre-commit by @fzyzcjy in #992
- [3/N] Enable
Bruleset in Ruff by @fzyzcjy in #993 - [4/N] Tiny enable
UPruleset in Ruff by @fzyzcjy in #994 - Super tiny further fix lint error by @fzyzcjy in #995
- Add DataSource and --data-source-path by @zhuzilin in #912
- Fix per token loss scale and add e2e ci by @zhuzilin in #990
- [FSDP] Add script for FSDP Qwen3-4B by @Hecate0821 in #988
- Fixed bug in checking max_length for SFT #997 by @Surya-Gunukula in #998
- [ci] Add CI to make sure all dense parallel gives the same grad norm by @zhuzilin in #1000
- [Feature] Add off-policy sequence masking algorithm proposed in DeepSeek v3.2 by @yitianlian in #999
- [FSDP][3/N] support true_on_policy training for FSDP2 by @zhuzilin in #1001
- fix lint by @zhuzilin in #1002
- Fix bare except clause and remove redundant computation in ppo_utils by @lancerts in #1007
- fix: FSDP runnable for Qwen3-30b-a3b by @yueming-yuan in #1010
- move tis function outside by @zhuzilin in #1014
- Add backward impl for SiluAndMulFunction and MoeSumReduceFunction by @zhuzilin in #1015
- refactor: expose compute_metrics_from_samples as public by @lancerts in #1012
- Fix evaluation parameter parsing by @guapisolo in #1005
- pre-commit run --all-files by @lancerts in #1021
- fix: update deprecated import path in mcore2hf script by @Chen-GX in #1003
- [FSDP] Add gpt oss 20b script by @PopSoda2002 in #996
- Fix mimo speculative decoding oom by @guapisolo in #1024
- [FSDP, VLM] feat: add vlm training for FSDP by @nanjiangwill in #501
- [rollout] support disable trim samples when converting rollout samples to train datas by @GGGGGGXY in #1016
- Backward compatible for older megatron version by @zhuzilin in #1028
- extract all sglang deps in megatron actor to one file by @zhuzilin in #1029
- feat: Add Unbiased KL Estimation from DeepSeek-V3.2 by @kekmodel in #1004
- refactor: extract duplicated checkpoint interval logic into reusable helper by @lancerts in #1027
- Fix typo in sglang_rollout.py comment by @ChenmienTan in #980
- fix ci for nodes with proxy by @zhuzilin in #1035
- [FSPP] fix args error in apply_fsdp2 function by @ChangyiYang in #1041
- [FSDP] Support lr scheduler by @ChangyiYang in #1040
- [Fix] Fix some bugs when on/offload model by @yitianlian in #1038
- Improve debug output formatting in replay_reward_fn.py by @lancerts in #1033
- Support pd disaggregation with p and d of same config by @zhuzilin in #1046
- [rollout] Truncate last token for rollout routing replay by @Hecate0821 in #1045
- fix: modernize type hint and add distributed init checks in utils by @lancerts in #1049
- Fix the padding of rollout routing replay experts by @zhuzilin in #1052
- update sglang to 0.5.6 by @lilei199908 in #1051
- [docker] fix cudnn version by @zhuzilin in #1066
- [docker] fix megatron cpu adam load issue by @zhuzilin in #1070
- fix(examples): correct quotes and comment out ray cleanup commands in Qwen3-30B-A3B FP8 script by @pandengyao in #1069
- Fix typos and improve clarity in documentation and code comments by @lancerts in #1067
- fix: remove redundant gc.collect() and combine split f-strings by @lancerts in #1074
- [FSDP, VLM] feat: true on policy for VLM by @nanjiangwill in #1056
- [VLM, FSDP] Update Experiment Readme by @nanjiangwill in #1079
- split train data in-advance to reduce communication by @zhuzilin in #1078
- [Feature] PD Disaggregation Support by @yitianlian in #1080
- fix raw_reward upload in fsdp by @zhuzilin in #1084
- [FSDP][vlm] Add B200 doc by @PopSoda2002 in #1082
- Add recompute loss function and enable by default by @zhuzilin in #1083
- Empty cache before finalize_model_grads to prevent unexpected oom by @zhuzilin in #1086
- Revert "Empty cache before finalize_model_grads to prevent unexpected oom" by @zhuzilin in #1087
- Set --train-memory-margin-bytes to 1GB by default by @zhuzilin in #1088
- set recompute_loss_function to false by default by @zhuzilin in #1089
- [VLM] fix: fix non true-on-policy vlm regression by @nanjiangwill in #1093
- fix_load_ckpt by @lilei199908 in #1095
- fix actor init bugs by @lilei199908 in #1098
- Fix gqa model tflops compute by @zhuzilin in #1099
- Fix bug for convert_hf_to_torch_dist.py by @zhuzilin in #1100
- [release] bump to v0.2.1 by @lilei199908 in #1096
New Contributors
- @fangzhensheng made their first contribution in #963
- @Lawhy made their first contribution in https://github.com/TH...
v0.2.0.post1
Fix critical bug mentioned in #958.
What's Changed
- extract mla update weight logic out by @zhuzilin in #960
- support do all evals together by @zhuzilin in #959
- Add --rollout-sample-filter-path by @zhuzilin in #961
- [FSDP] Optimize FSDP2 Model Loading with Rank-0 Broadcast by @Hecate0821 in #915
- Add sample.remove_sample by @zhuzilin in #977
- add --eval-max-prompt-len by @zhuzilin in #978
- Add args check for max_context_len by @zhuzilin in #979
- Remove hard coded balance_abs_threshold by @zhuzilin in #981
- Tiny fix fp8_cast_bf16 not copying chat template by @fzyzcjy in #964
- Super tiny install dnsutils in dockerfile by @fzyzcjy in #965
- Super tiny sanity check checkpoint dir by @fzyzcjy in #966
- Fix convert_hf_to_torch_dist OOM by @fzyzcjy in #967
- Tiny support using environment variables in addition to arguments for all scripts by @fzyzcjy in #968
- Super tiny increase default timeout sec by @fzyzcjy in #969
- Fix random port in use error even though already have free port detection by @fzyzcjy in #970
- Super tiny enable draft-weights-cpu-backup to avoid MTP acc len issue by @fzyzcjy in #971
- Add generation function for benchmarking purpose by @fzyzcjy in #972
- Support zero host or device memory waste for weight update by @fzyzcjy in #973
- Add fp8 kv cache and tis in qwen3 30b a3b script by @fzyzcjy in #974
- Add GB200, MTP, benchmark, fp8 rollout mode to glm script by @fzyzcjy in #975
- [FSDP] Add private func indicator for better usage by @PopSoda2002 in #982
- [Bugfix] Rename save model by @PopSoda2002 in #983
- Fix: resolve variable shadowing bug in setup_model_and_optimizer by @fangzhensheng in #963
New Contributors
- @fangzhensheng made their first contribution in #963
Full Changelog: v0.2.0...v0.2.0.post1
v0.2.0
We are thrilled to announce the release of slime v0.2.0! Thanks to the incredible support and contributions from our community, slime has gained significant features and substantial performance enhancements in this version.
Major Updates
- FSDP Backend: Introduced a fully Fully Sharded Data Parallel (FSDP) based training backend for improved scalability.
- PPO Support: Added native support for Proximal Policy Optimization (PPO).
- MTP Training: Enabled training of the MTP (Multi-Token Prediction) during Reinforcement Learning.
- FP8 Full Stack: Support for both FP8 training and FP8 inference.
- Train-Inference Mismatch: Alleviate or even eliminate train-inference mismatch
- Importance Sampling: Custom interface for train-infer importance sampling (e.g., MIS).
- Routing Replay: Added Rollout Routing Replay (R3) and Routing Replay (R2).
- True On-Policy Training: Enabled strictly on-policy training with dense models on the FSDP backend.
- Performance Improvements
- Memory Optimization: CUDA Graphs offload, asystem-amem integration.
- Faster Weight Updates: Significantly accelerated FP8 weight updates.
- Python-based Router: A new slime router implemented in pure Python for accessibility.
- Fault Tolerance: Added robustness with fault tolerance for the rollout engines.
- Custom Configs: Support for passing customized configurations via
--config. - [Experimental] Checkpoint Loading: Added support for Megatron-bridge based checkpoint loading.
- New Examples
- Fully Async Training
- Multi-Agent Scenarios
- On-Policy Distillation
- Retool
What's Changed
- [Doc typo] Update amd_tutorial.md by @yushengsu-thu in #246
- [bugfix] use fp32 for rollout_log_probs by @zhuzilin in #245
- Complete the RayTrainGroup args string docs. by @MrAta in #248
- Update speculative decoding doc and sglang patch by @guapisolo in #250
- fix debug-rollout-only by @zyzshishui in #249
- retool in one commit by @maocheng23 in #237
- fix: modify the rotary-base of qwen-3b to 1000000 for consistency by @YuchenFan48 in #252
- update logging and fix typo by @maocheng23 in #254
- [bugfix] fix read data containing "tools" field by @Maybewuss in #255
- Revert "[bugfix] fix read data containing "tools" field" by @zhuzilin in #256
- add shell script for qwen3-32B task by @Gao016 in #253
- docs: Fix custom interface documentation errors by @GeLee-Q in #251
- [example] Add fully async example by @zhuzilin in #258
- added sphinx-based documentation by @FrankLeeeee in #262
- fixed build error for documentation by @FrankLeeeee in #263
- [bugfix] Fix bugs on multi samples from one prompt (multi-agent) by @zhuzilin in #260
- fixed sphinx configuration by @FrankLeeeee in #264
- [bugfix] fix read data containing "tools" field by @Maybewuss in #259
- add DeepWiki badge by @richardodliu in #265
- [doc] add example doc to the website by @zhuzilin in #267
- [doc] add blogs by @zhuzilin in #268
- Update actor_group.py by @zlH518 in #266
- [doc] prettify language convertion toggle by @zhuzilin in #270
- [example] add an example for multi-agent rl by @yinpeisu in #269
- [refactor] Add isort back and move global gloo to global util by @zhuzilin in #273
- [refactor] remove over_sampling_filter and extract some functions by @zhuzilin in #278
- [feat] init support for FSDP by @zhuzilin in #282
- Chatbot entry for Sphinx style docs by @jhinpan in #284
- Revert "Chatbot entry for Sphinx style docs" by @zhuzilin in #286
- Remove get_rollout_data from actor_group by @MrAta in #285
- Add docs logo by @jhinpan in #283
- [Hardware] AMD Dockerfile update - support up to d4a7741 (Sep 6, 2025) by @yushengsu-thu in #307
- [feat] init xtuner backend by @zhuzilin in #310
- [docker] update to sglang 0.5.2rc2 by @zhuzilin in #313
- Add model version attribute in each sample by @yitianlian in #271
- [nfc] cleanup for weight_version by @zhuzilin in #314
- Add raw reward metric in fdsp backend by @yitianlian in #315
- fix: check args.save when save_interval is set. by @SanftMonster in #308
- Fix comment for --load parameter in checkpoint configuration (Quick Start Doc) by @Arist12 in #306
- [refactor] bind numa and rename num_gpus_per_node by @zhuzilin in #316
- [xtuner] unroll TrainingWorker and TrainEngine by @zhuzilin in #322
- [xtuner] add wandb by @zhuzilin in #324
- [bugfix] fix no weight_version for aborted samples by @zhuzilin in #327
- [FSDP] Verify FSDP backend availability via uv install / pip install by @Zhuohao-Li in #325
- Add FSDP extras dependency and import test (#302) by @souhil25 in #303
- fix: small bug fix in the rollout_buffer_example.sh by @rbao2018 in #328
- [refactor] remove slime/backend/utils and extract slime_validate_args by @zhuzilin in #329
- feat: auto configure megatron from hf config. by @SanftMonster in #312
- Do not read ip if env is provided by @oraluben in #337
- [rm_hub] fix ground_truth type error in grade_answer_verl by @GGGGGGXY in #336
- [feat] use one global httpx.AsyncClient and remove --use-http2 by @zhuzilin in #338
- [Refactor] Merge rollout controller into rollout manager by @PopSoda2002 in #304
- add dockerfile and patch for b200 by @maocheng23 in #340
- [feat] init support for PPO by @zhuzilin in #342
- wrong expressions and typo by @ArtificialZeng in #343
- Add basic VLM data pipeline by @ppraneth in #335
- [FSDP] Add reference model support for correct KL loss computation #296 by @UbeCc in #344
- fix incorrect sft loss mask for qwen3 thinking series models. by @luppx in #330
- feature: ppo by @lilei199908 in #347
- [FIX] NVLINK detection method in scripts by @JustinTong0323 in #356
- fix lint by @JustinTong0323 in #358
- [feat] add --critic-lr and --num-critic-only-steps by @zhuzilin in #350
- [refactor] Add actor registry by @zhuzilin in #359
- Added GB200 patches for SGLang v0.5.2 by @sam571128 in #360
- [bugfix] fix the num_tokens used for per_token_loss in multi-turn training by @zhuzilin in #365
- [Feature] Support token in token out for multi turn tasks by @yitianlian in #242
- [router] support slime-router only by @zhuzilin in #366
- [router] extract middleware folder by @zhuzilin in #367
- [feat] support distributed post to enable more concurrent requests by @zhuzilin in #368
- [FEAT] Deterministic rollout by @JustinTong0323 in #361
- [reproducibility][docker] enable training reproducibility by @zhuzilin in #370
- [feat] enable use_flattened_tensor_bucket with quantization config by @zhuzilin in #374
- [fix] fix ppo bugs by @lilei199908 in #373
- docs: add B200/H-series GPU hardware support information by @Williamren97 in #380
- [model] fix run-qwen3-30B-A3B.sh by @yefei12 in #382
- Enable loss mask for sft by @UbeCc in #377
- [fix] fix paths in get_started.md by @hyleepp in #375
- [FSDP] Data Packing Implementation in FSDP backend by @jhinpan in #321
- [feat] add --use-routing-replay by @zhuzilin in #387
- fix bug for convert Qwen3-235B-A22B HF model weight to Megatorn torch_dist format by @Gao016 in #386
- [FSDP] Add update weight class from distributed by @pop...
v0.1.0
Performance Optimizations
- SGLang: FP8 + DeepEP + speculative decoding
- Megatron: all parallel strategy supports (TP, PP, VPP, EP, CP, etc) + DeepEP + CPU Adam.
- New Megatron offload strategy with better memory usage.
- Faster weight updation.
New Algorithm Supports
- GSPO
- TIS
- reinforce++ & reinforce++ base
Correctness
- CI for E2E GLM4 9B adn Qwen3 30B-A3B training
- CI for Build Conda environment