Releases: inclusionAI/AReaL
v0.5.1
Highlights
This is a patched release upon v0.5.0.
- A new docker image with
math-verifyand the latestruff. - Support for PPO critic model support with Megatron engine.
- Refactored FSDP/Megatron engine implementations.
- Implement efficient RPC tensor transfer with
RTensor(aka the originalDistributedBatch). - Beam seach support for vLLM.
What's Changed
- fix: change checkpoint cleanup flag to fix update_weights_from_disk in single-controller mode by @HwVanICI in #711
- fix: prevent port overflow in vLLM server with high data parallelism (fixes #652) by @HsiaoTsan in #653
- refactor: refactor train engine high level APIs by @aaaandychen in #658
- [Fix] Fix the bug that experiments cannot properly exit in the TIR example by @nuzant in #712
- chore: print more information in concat mode and handle empty tool calls for easy debugging by @nuzant in #713
- chore: trim tests in CI by @garrett4wade in #714
- refactor: enforce task_id creation, access, and manipulation in inference engines by @garrett4wade in #715
- refactor: redesign TrainEngine API with cleaner abstractions by @rchardx in #719
- [Testing] Add SFT/GRPO integration test for Megatron train engine. by @nuzant in #726
- [FEAT] VLLM support for VLM training by @HwVanICI in #698
- feat: Support beam_search in vllm backend by @ZiyiTsang in #721
- fix: update multi-turn math test configuration by @rchardx in #727
- fix: fix logic error in beam search support check by @rchardx in #728
- feat: add PPO Critic model support for MegatronEngine by @rchardx in #729
- feat: implement RTensor for metadata transfer in the single-controller mode by @garrett4wade in #731
- fix: fix multi-turn proxy example by @dhh1995 in #733
- minor fix: fix openai cache test, add it in CI test suite, and remove OOD todos/fixmes in Megatron engine by @garrett4wade in #732
- [Feat] XCCL-updates for single LoRA functionality for ascend-vLLM by @gursimar in #679
- fix: use group_size=1 for eval in proxy examples by @dhh1995 in #737
- feat: add ignore_eos and skip_special_tokens generation params by @rchardx in #738
- chore: update datasets to version 3.0.0 or higher for inner API compatibility by @ZiyiTsang in #720
- feat: build the docker image with math-verify and the latest ruff by @garrett4wade in #744
- bump v0.5.1 by @garrett4wade in #745
New Contributors
- @HsiaoTsan made their first contribution in #653
- @aaaandychen made their first contribution in #658
- @gursimar made their first contribution in #679
Full Changelog: v0.5.0...v0.5.1
v0.5.0
Highlights
The newly released v0.5.0 of AReaL introduces two core innovations: Seamless Agentic RL and the Single Controller architecture:
-
Seamless Agentic RL: AReaL provides a seamless intelligent agent training service via OpenAI-compatible APIs. This facilitates seamless collaboration among environment providers, algorithm developers, and system engineers, forming a zero-friction pipeline in complex engineering workflows and significantly boosting development efficiency and system maintainability.
-
Single Controller Architecture: Eliminates long-tail latency and data imbalance issues inherent in SPMD (Single Program, Multiple Data) models. This layered design enhances inference scalability, enables fine-grained system-level control, and preserves algorithmic flexibility while minimizing code migration costs for algorithm developers.
Other changes include:
-
Performance & Scalability: Major refactoring to streamline step detection, assignment logic, and workflow batching. Improved distributed training with fixes for NCCL timeouts, Gloo group barriers, and vocab-parallel logprobs for FSDP.
-
Model & Hardware Support: Added single LoRA functionality for Ascend-vLLM and improved handling for Vision-Language Models (VLMs).
-
Fixes & Refinements: Resolved numerous bugs related to data loading, reward timeouts, interaction caching, process cleanup, and tool call parsing. Significant code refactoring to merge duplicate logic, improve type hints, and centralize asset management. Project-wide code formatting switch to ruff.
Future Work
AReaL currently supports the basic Single Controller mode and Agentic RL training pipeline. Future enhancements include:
- Optimized data flow and distributed launch capabilities under Single Controller mode;
- Automatic scaling, fault recovery, and high-availability training;
- Improved training-inference performance in agent-centric scenarios.
What's Changed
- update readme for qwen3-vl by @garrett4wade in #578
- [FIX] add recipe directory to pre-commit checks by @fishcrap in #580
- [FIX] reduce reward timeout warning by @fishcrap in #579
- [FIX] fix compute logp temperature by @fishcrap in #581
- feat: rebuild step detection around global batches by @rchardx in #583
- chore: extend wait timeout and hardens config checks by @rchardx in #585
- feat: streamline step assignment logic by @rchardx in #584
- fix: Use background threads to commit tasks and fetch results in workflow executor by @garrett4wade in #587
- fix: reuse
aiohttp.ClientSessioninagenerateby @garrett4wade in #589 - chore: automates session tracing context by @rchardx in #591
- [feat] add Serializer for rpc server by @CormickKneey in #566
- doc: improve tracer documentation with custom phase support and improved plotting by @rchardx in #594
- [feature] Support concat export completions in proxy mode by @yulangz in #582
- Fix trainer to use backend information from allocation mode by @dhh1995 in #596
- fix: fix the stucking issue of
rollout_batchby @garrett4wade in #595 - fix: extends NCCL group timeout coverage by @rchardx in #598
- chore: use typevar to type hint loaded config by @dhh1995 in #603
- fix: safely close all ClientSessions with ContextVar by @garrett4wade in #605
- chore: remove requirements.txt by @garrett4wade in #604
- [Feat] Add train/rollout offload support by @fishcrap in #590
- [FIx] Use gloo group barriers for distributed synchronization by @fishcrap in #607
- feat: adds scheduled profiler tracing by @rchardx in #608
- refactor: let WorkflowExecutor.wait return a list with
Noneby @garrett4wade in #612 - [feat] add local scheduler for single controller mode by @daihaowz in #610
- refactor: separate
BatchTaskDispatcherfromWorkflowExecutorby @garrett4wade in #613 - chore: upload paper to the repo by @garrett4wade in #616
- chore: clarifies agent onboarding guide by @rchardx in #617
- refactor: improves async coordination by @rchardx in #618
- [FIX] fix
enable_offloadbreak change and add offload/onload API by @fishcrap in #625 - refact: update gconfig to update stop token ids in workflows instead of in example scripts by @dhh1995 in #626
- chore: improve workflow batching safeguards by @rchardx in #624
- chore: ensures worker threads exit cleanly by @rchardx in #630
- bug fix: correctly shuffling data with distributed sampler by @garrett4wade in #632
- rename CompletionCache to InteractionCache by @dhh1995 in #631
- refactor: merge base_hf_engine with fsdp_engine for code cleanup by @garrett4wade in #629
- chore: format all files under areal/utils with ruff by @garrett4wade in #635
- chore: format all tests with ruff by @garrett4wade in #636
- chore: format remaining files under
areal/with ruff by @garrett4wade in #637 - ci: update ci formatter to ruff by @garrett4wade in #638
- chore: tunes NCCL IB settings by @rchardx in #640
- [feat] implement train controller for single controller by @daihaowz in #614
- fix: modify the default value of "shuffle" and "drop_last" for validation datasets by @garrett4wade in #633
- [Feat] Single LoRA functionality for ascend-vLLM by @HwVanICI in #621
- fix: prevent zombie vLLM processes when Ray launcher kills tasks by @zhshgmail in #623
- refactor: add
export_statsas engine's method by @garrett4wade in #643 - [feat] impl rollout controller for single controller by @dingzhiqiang in #611
- feat: implement proximal log-probability approximation for decoupled PPO by @zhshgmail in #600
- fix: fixes CLI docs import order by @rchardx in #646
- refactor: refines PPO/GRPO loss by @rchardx in #650
- refactor: merge duplicate process termination functions into unified kill_process_tree by @garrett4wade in #648
- feat: simplify openAI agent integration and allow training with any customized agent by @garrett4wade in #657
- fix: tear down local inference servers when caling
destroyby @garrett4wade in #659 - fix: vlm input slicing by @HwVanICI in #651
- refactor: move logprob and value computation into TrainEngine by @rchardx in #663
- fix: fix drop last for data loader with distributed sampler by @dhh1995 in #665
- [FIX] Initialize llm_addrs in Slurm launcher for SFT jobs by @fishcrap in #662
- refactor: apply PPOTrainer and SFTTrainer in example scripts by @garrett4wade in #660
- feature: implement vocab-parallel logprobs for FSDP by @rchardx in #667
- refact: expose workflow executor in inference engine by @dhh1995 in #676
- fix: raise AttributeError instead of returning None in Platform.getattr by @rchardx in #672
- fix: add missing device_control_env_var to CpuPlatform by @rchardx in #681
- fix: override workflow_executor property in MockInferenceEngine by @rchardx in #682
- refactor: make processing multi_modal_input generic by @HwVanICI in #678
- refactor: refactor attention mask generation logic for clarity by @rchardx in #685
- [Feat] Implement GRPO trainer and weight exchange for single-controller mode by @dingzhiqiang in #666
- refact: rename set_final_reward to set_last_reward, also fix openai gen args by excluding lora_name by @dhh1995 in #675
- fix: fix CPU offloading in FSDP grad clipping and weight updates by @rchardx in https://github.com/inclus...
v0.4.1
What's Changed
- feat: add
raise_timeoutparameter to allow quiet waiting for inference results by @garrett4wade in #547 - Fix batch size in example
examples/vlm/clevr_count_70k_grpo.yamlby @wangruohui in #549 - chore: format dataset and reward folders with ruff by @garrett4wade in #551
- refactor: rename the
should_acceptargument inrollout/prepare_batchtoshould_accept_fnby @garrett4wade in #555 - chore: delete not-planned experimental features by @garrett4wade in #554
- feat: add grpo trainer and simplify gsm8k grpo example by @dhh1995 in #552
- feat: add launch_server and teardown_server in inference engine api by @garrett4wade in #550
- [Refactor] refactor
stats_trackerusage in engines and examples by @nuzant in #556 - refactor: allow passing string paths and init kwargs as rollout workflows by @garrett4wade in #525
- feat: introduces session-centric tracing APIs by @rchardx in #539
- doc: Add notes about asynchronous RL training by @garrett4wade in #558
- format: ruff format examples directory by @fishcrap in #559
- feat: support proxy server and client for training openai-compatible agents by @dhh1995 in #500
- chore: change type annotations and minor fixes for single-controller mode by @garrett4wade in #560
- docs: add "Performance Profiling" guide to best practices by @rchardx in #538
- add README for proxy_agent by @yulangz in #561
- chore: extends engine perf instrumentation by @rchardx in #562
- [FEAT] add pause/resume generation for vLLM server by @fishcrap in #563
- doc: update AReaL design doc with the current dev status by @garrett4wade in #568
- doc: update documentation to align the current dev status by @garrett4wade in #570
- refactor: extend allocation mode to support allocation naming and composition by @garrett4wade in #565
- feat: align perf_tracer with task hierarchy by @rchardx in #569
- chore: add hint for the breaking change of allocation mode by @garrett4wade in #572
- [FIX] fix atrace_session_phase in workflow by @fishcrap in #573
- chore: Quick fix for GSPO missing in doc by @ZiyiTsang in #576
- ci: build docker images with GCP by @garrett4wade in #564
- refactor: restrict the usage scope of the
rollout_batchmethod by @garrett4wade in #567 - chore: add issue template for questsions by @garrett4wade in #571
- ci: automatically tag the dev image upon new releases by @garrett4wade in #574
- chore: remove the old script used for validating installation by @garrett4wade in #575
- [FEAT] Add Qwen3-VL model support for fsdp by @fishcrap in #557
- bump v0.4.1 by @garrett4wade in #577
New Contributors
- @wangruohui made their first contribution in #549
Full Changelog: v0.4.0...v0.4.1
v0.4.0
AReaL v0.4.0 Release Notes
We're excited to announce AReaL v0.4.0, a major release that brings stable infrastructure support for RL training of MoE models.
Overview
MoE Training
While we introduced the Megatron backend as an experimental feature last month, several critical issues prevented us from offering it as a stable release. These challenges included:
- Training precision alignment between inference and training
- Weight transfer complications
- Lack of validated end-to-end MoE model training in production
AReaL v0.4.0 comprehensively addresses these issues. In our experiments, we can run fully asynchronous agentic RL training (GRPO) of the Qwen 235B model on merely 6 H200 nodes. We didn't encounter any crashes for several days' training.
Transitioning from FSDP to Megatron requires only 3-5 lines of code changes in your training script. For detailed guidance, see our tutorial on fine-tuning large MoE models.
Agent Framework Support
Beyond stable MoE training, we're expanding native support for agent frameworks like Camel-AI and openai-agents. Previously, AReaL's trainable agent was encapsulated in a RolloutWorkflow object, which required users to manually manipulate token IDs for each LLM interaction. While agent frameworks abstract away this complexity, they cannot capture token IDs or maintain the execution order of LLM interactions.
To solve this, AReaL v0.4.0 introduces ArealOpenAI, a drop-in client that mimics the AsyncOpenAI API. This client acts as an agent proxy that:
- Secretly captures token IDs from your agent
- Maintains execution order to ensure trajectory consistency
- Supports assigning rewards to individual conversations
- Enables reward discounting across conversation turns
While this feature is currently experimental, we encourage usersto explore our latest documentation on agentic RL and give it a try.
Key Highlights
Stable MoE Training
- bf16 Megatron training recipes with aligned precision across components
- NCCL-based weight updates
Agent Framework Integration
- Native support for openai-agents SDK and Camel
Developer Experience
- Integration with modern tooling:
ruffanduv - Simplified installation:
uv pip install -e .[all]installs all dependencies
New Algorithm
- GSPO support added
We're grateful for your continued support and feedback. Happy training!
What's Changed
- chore: add comprehensive agent operations guide to AGENTS.md by @rchardx in #440
- fix boba_grpo bug by @shun001 in #439
- fix KeyError: "full_loss_mask" without ulysses and synchronize the boba GRPO yaml config by @garrett4wade in #441
- Support weights update from distributed sources for Megatron by @rchardx in #413
- [Feature] Add patch to accelerate SGLang weight loading by @nuzant in #324
- fix update weights from disk in FSDP engine by @garrett4wade in #443
- feat. Add more optimizer choice by @ZiyiTsang in #431
- fix: move pause/continue_generation operations into update_weights by @rchardx in #446
- chore: add optimizer content within Doc by @ZiyiTsang in #447
- [Fix] Fix
examples/env/setup-pip-deps.shby @nuzant in #455 - feat: support additional bash cmds before running the training commands when using slurm by @dhh1995 in #452
- fix: Use DistributedSampler for dataloader instead of splitting dataset by @dhh1995 in #456
- fix: update weight format handling for MegatronEngine in recover.py by @rchardx in #458
- [FIX] fix default values of args in cli_args by @fishcrap in #448
- [Feature] Add AEnt support by @hanshen95 in #403
- refactor: migrate experimental Megatron components to main API; fix: fix several bugs in FSDP engine by @rchardx in #459
- fix: fix the deprecated usage of tuple slicing, wandb's start_method, and transformers's dtype by @rchardx in #462
- fix: fix a bug in critic model's forward when Ulysses SP is enabled by @rchardx in #461
- fix: fix the NaN ppl values in FSDP SFT when Ulysses is enabled by @rchardx in #463
- refactor: separate staleness control from workflow execution by @garrett4wade in #444
- refactor: move WorkflowExecutor from areal.api to areal.core by @garrett4wade in #467
- refactor: simplify rollout in training scripts with the
connect_engineAPI by @garrett4wade in #451 - doc: add pull request template and contribution guide by @garrett4wade in #468
- refactor: merge duplicate codes in SGLang/vLLM engines by @garrett4wade in #445
- chore dev: use ruff in pre-commit and remove unused files by @garrett4wade in #471
- chore: fix CI formatting check by @garrett4wade in #472
- Fix ruff formatting in CI and replace isort in CI with ruff by @garrett4wade in #477
- fix: update import paths for math_parser in multiple files by @rchardx in #478
- refactor the async task submission logic in workflow executor into task runner by @garrett4wade in #473
- fix: access weight_update_mode from config.actor for GRPOConfig (#482) by @zhshgmail in #483
- [Feature] Use
uvfor package management and installation by @nuzant in #485 - [Feature] Add SkyPilot examples by @nuzant in #422
- fix: use unfused optimizers for VLM with tensor parallelism by @rchardx in #486
- [FEAT] integrates openai-agents by @fishcrap in #470
- feat: add performance tracing support by @rchardx in #487
- [Bug Fix] Revert generating
requirements.txtstep usinguvin pre-commit by @nuzant in #488 - chore: log request timeout details by @rchardx in #490
- chore: format
areal/apiwith ruff by @garrett4wade in #491 - chore: refreshes agent handbook by @rchardx in #492
- [FEAT] Integrate Camel-AI by @fishcrap in #474
- [Testing] Add auto CI on GCP and fix tests by @nuzant in #494
- [Doc] Add information about CI and tests in CONTRIBUTING.md by @nuzant in #499
- [FEAT] Support PyTorch DCP for FSDP by @fishcrap in #497
- add math_reflection_en notebook by @samjia2000 in #496
- Implement M2PO algorithm by @tsjyma in #480
- feat: add examples which traces performance data by @rchardx in #498
- [Feature] Automatically split layers into pipeline stages in MegatronEngine. by @nuzant in #504
- feat: support request-level tracing by @rchardx in #509
- [FEAT] Add experiment metadata (git commit) tracking and refactor version info by @fishcrap in #511
- fix: use per-rank jsonl instead of file lock in case that NFS does not support it by @rchardx in #513
- fix: remove the usage of _acquire_file_lock by @rchardx in #515
- feat: extract tool output from openai-agents sdk by @CormickKneey in #507
- [FEAT] Add pipeline parallel support for vLLM inference engine by @fishcrap in #510
- fix(launcher): improve error handling and node calculation in ray.py by @dafu-wu in #518
- feat: add metadata extraction and remapping for process and thread IDs in trace events by @rchardx in #519
- fix: enhance rollout statistics tracking with enqueued state by @rchardx in #522
- feat: implement GSPO (Group-level Sequential Policy Optimization) by @zhshgmail in #501
- [Doc] Add Megatron training tutorial doc. by @nuzant in #521
- [Bug Fix] Fix deploy docs CI by @nuzant in #526
- chore: refactor boba GRPO for tracing by @rchardx in #527
- Docs: Add CAMEL training tutorial and improve variable naming by @fishcrap in https://github.com/inc...
v0.3.4.post1
v0.3.4.post1 Patch Fix
- Fixed a "full_loss_mask" KeyError introduced in #434. The original PR was tested with Ulysses enabled but caused errors when Ulysses was disabled.
- Updated configuration and scripts in
boba_grpo.pyto reproduce legacy results.
What's Changed
- chore: add comprehensive agent operations guide to AGENTS.md by @rchardx in #440
- fix boba_grpo bug by @shun001 in #439
- fix KeyError: "full_loss_mask" without ulysses and synchronize the boba GRPO yaml config by @garrett4wade in #441
New Contributors
Full Changelog: v0.3.4...v0.3.4.post1
v0.3.4
AReaL v0.3.4 Release Note
Highlights
- Support NPU training with the vLLM inference backend
- New algorithm implementations: RLOO, REINFORCE++, PPO with critic models, RLHF reward modeling
- New RL examples: multi-turn math, Tongyi DeepSearch agent (with nearly zero-code change compared with the official agent implementation), and tool-integrated reasoning
- Implemented LoRA with FSDP training support
- Enhanced documentation with hyperparameter explanations
What's Changed
- fix: prevent submitting duplicate data for rl evaluation by @garrett4wade in #356
- Support Gemma3 models (multimodal) by @rchardx in #350
- doc: add historical highlights by @garrett4wade in #357
- [Bug Fix] Fix a bug in Megatron weight loading from huggingface ckpts by @nuzant in #358
- chore: add gemma3 model test and some minor changes in scripts by @garrett4wade in #343
- FEAT: New chapter in Doc by @ZiyiTsang in #321
- [Emergency fix] Display error in Doc by @ZiyiTsang in #359
- chore: set
swanlab.modetodisabledby default by @garrett4wade in #363 - Document a new example of examing rollout results in both Transformers and the inference engine by @rchardx in #361
- [Feature] Add support for Reward Model fine-tuning by @catnanami in #331
- chore: fix a typo in gsm8k example readme by @garrett4wade in #364
- [FIX] save config by @fishcrap in #365
- Fix typo in Doc by @ZiyiTsang in #367
- Add swanlab log for StatsLogger by @MayDomine in #362
- [Feature] Support gradient checkpointing options for MegatronEngine by @nuzant in #368
- chore: add CLI options in internal repo by @garrett4wade in #369
- doc: Add docstrings for engine API by @garrett4wade in #370
- Using Dict to replace TensorDict by @rchardx in #371
- add controller and scheduler api by @dingzhiqiang in #373
- refactor: unifying normalization for rewards and advantages in PPO by @garrett4wade in #372
- feat: add trajectory format checking and support lazy platform initialization. by @garrett4wade in #374
- doc: refine docstrings of controller API by @garrett4wade in #376
- Change the default value and behavior of expert_tensor_parallel_size in ParallelStrategy; Clean up the code by @rchardx in #378
- doc: add documentation to explain hyper-parameters by @garrett4wade in #381
- feat: add tir local example by @mjbmjb in #360
- [emergent fix] Fix doc gen pre-commit hook by @garrett4wade in #383
- Refactor FSDP Engine and its utilities for future expert parallelism by @rchardx in #384
- Fix missing imports in experimental/api/cli_args.py by @rchardx in #386
- [FIX] Remove training script requirement for LLM_SERVER_ONLY mode in launcher by @fishcrap in #387
- Lora Feature by @tutu0038-hk in #304
- Revert "Lora Feature" by @garrett4wade in #390
- feat: support NPU and vLLM by @dazhuangzhuang1024 in #351
- feat: support PPO training with critic models by @dhh1995 in #392
- impl distributed batch memory by @dingzhiqiang in #379
- fix the missing
dataclasswrapper in vLLMConfig and update cli reference by @garrett4wade in #393 - feat: support lora (again) by @garrett4wade in #391
- fix: add back areal folder to PYTHONPATH for launcher by @dhh1995 in #396
- feat: support leave-one-out mean and unbiased std estimation for normalization by @garrett4wade in #394
- [Feature] Update
ArealOpenAIAPIs, adds an example that finetunes Tongyi-DeepResearch agent. by @nuzant in #385 - chore: update readme by @garrett4wade in #395
- feat: support RLOO algorithm using leave_one_out norm and update docs by @dhh1995 in #397
- feat: save exp config to wandb by @dhh1995 in #402
- fix: fixed nccl -> xccl change for remaining examples by @dhh1995 in #400
- add .DS_Store to .gitignore by @dhh1995 in #406
- update: apply batch norm to advantages, move original normalization to reward_norm by @dhh1995 in #401
- fix stats logger bug by @dhh1995 in #407
- Add rpc interface by @yulangz in #377
- feat: support different kl estimator, support reinforce++ and reinforce++-baselin by @dhh1995 in #408
- doc: update supported algorithms in readme by @garrett4wade in #411
- [Feat] support vllm with slurm launcher by @fishcrap in #404
- chore: Move multi-turn math example outside the
mathfolder and update README by @garrett4wade in #416 - fix vllm compatibility issue by @garrett4wade in #417
- disable radix cache for sglang by default by @garrett4wade in #421
- fix launching vllm with ray by @garrett4wade in #420
- fix unit tests for the next release by @garrett4wade in #418
- [FIX] fix launch error without training scripts in LLM_SERVER_ONLY mode by @fishcrap in #425
- [Feature]Allgather logprobs rather than logits in seqlen dim in FSDP Ulysses sp by @Moocharr in #375
- fix: fix FSDP tensor parallelism for PPO by @dhh1995 in #426
- Bump v0.3.4 by @garrett4wade in #430
- Revert "Bump v0.3.4" by @nuzant in #433
- Fix a bug of preparing Ulysses inputs for loss function in PPO by @rchardx in #434
- fix:launch_server.py: error: unrecognized arguments: --max-loaded-lor… by @zjrwtx in #428
- fix-influence-trajectories by @tangyc314 in #436
- [Bug Fix] Fix multi-turn math example config by @nuzant in #437
- Bump v0.3.4 by @garrett4wade in #438
New Contributors
- @catnanami made their first contribution in #331
- @MayDomine made their first contribution in #362
- @dingzhiqiang made their first contribution in #373
- @mjbmjb made their first contribution in #360
- @tutu0038-hk made their first contribution in #304
- @dazhuangzhuang1024 made their first contribution in #351
- @dhh1995 made their first contribution in #392
- @yulangz made their first contribution in #377
- @Moocharr made their first contribution in #375
- @zjrwtx made their first contribution in #428
- @tangyc314 made their first contribution in #436
Full Changelog: v0.3.3...v0.3.4
v0.3.3
Release Note
We're excited to announce AReaL v0.3.3, which stabilizes training for larger dense models with extended context lengths. This release includes essential improvements and new algorithms to deliver the best out-of-the-box experience for users.
Enhanced Parallelism Support
- Added hybrid parallelism with FSDP backend: tensor parallelism, Ulysses sequence parallelism, and sequence-parallel activation checkpointing
- Zero conversion required – use ANY Hugging Face model directly
- Memory efficient – support for long context lengths with reduced GPU activation memory
- Usage:
allocation_mode=sglang:d8 + fsdp:d2c2t2
New Algorithm Features
- PPO with clip higher
- Dynamic sampling with variable batch sizes
- Over-length penalty mechanism
- Decoupled mean/std computation for advantage estimation
Hardware Compatibility
- We are ready to support additional hardware backends beyond NVIDIA GPUs (more announcements coming soon!)
What's Changed
- FEAT: Decoupled CLIP ratio (DAPO Trick-I) by @ZiyiTsang in #285
- Add agent-related logging logic in ppo actor & Update notebook example by @samjia2000 in #290
- FEAT: Dynamic_Sampling(DAPO Trick-II) by @ZiyiTsang in #294
- refactor: refactor examples structure, make fsdp and ulysses use independent device meshes by @garrett4wade in #297
- doc: update the doc of using ulysses sp by @garrett4wade in #298
- [TEST] megatron dcp save load test by @fishcrap in #306
- fix doc: update package installation method within container by @garrett4wade in #307
- refactor: group examples according to application by @garrett4wade in #305
- fix: add the missing group argument in data redistribution by @garrett4wade in #311
- use sourceTensor.detach().clone() rather than torch.tensor(sourceTensor) by @CormickKneey in #308
- add countdown example by @samjia2000 in #299
- Support tensor parallelism for FSDP engine by @rchardx in #309
- FEAT: Overlong_Reward_Penalty (DAPO Trick-III) by @ZiyiTsang in #295
- chore: add engine IDs to differentiate different ranks in logs by @garrett4wade in #314
- In remote engine, find sglang server using experiment name and trial name by @samjia2000 in #301
- [Bug Fix] Fix server_idx initialization in RemoteSGLangEngine by @nuzant in #318
- chore: Fix signatures of
rollout.initializein examples by @nuzant in #319 - chore: amend the
should_acceptargument inrollout_batchwith docs by @garrett4wade in #316 - fix: The shape of attention_mask itself gets changed when removing pads by @jwhj in #325
- Fix the gradient norm clipping for FSDP engine by @rchardx in #320
- chore: raise error when using slurm with apptainer and images are not specified. by @nuzant in #329
- Apply sequence parallel to LayerNorm/RMSNorm layers by @rchardx in #330
- chore: add ci to close stale issues by @garrett4wade in #332
- Import missing AllocationMode by @rchardx in #333
- [Feat] Add device agnostic feature by @lowdy1 in #327
- Update pre-commit hooks and rerun against all the files by @rchardx in #334
- Decouple the mean&std advantage normalization (Trick Dr. GRPO and LitePPO) by @ZiyiTsang in #303
- fix fsdp engine: qwen3 TP q/k norm wrapping, gradient clipping, the scale of grad norm, and sft scripts by @garrett4wade in #335
- chore: preventing CI to close stale PRs by @garrett4wade in #337
- fix: revert the order of evaluation and recover in entrypoints, fix all unit-tests by @garrett4wade in #323
- [device agnostic] fix examples with the usage of
current_platformby @garrett4wade in #338 - refactor: move
should_acceptto thesubmitmethod instead of thewaitmethod by @garrett4wade in #339 - [Bug Fix] Fix loading Qwen2 1.5B with MegatronEngine and mbridge by @nuzant in #341
- [Feature] Add deterministic option for MegatronEngine by @nuzant in #340
- [device agnostic] chore: replace cuda with current_platform.device_type by @garrett4wade in #336
- Fix a bug of embeds_token for VL models; Refine some YAML configuration files by @rchardx in #342
- doc: update readme by @garrett4wade in #346
- doc: update readme by @garrett4wade in #347
- Fix examples by @fishcrap in #344
- [Tests] Fix some bugs in examples, add a unittest that runs every examples for one step. by @nuzant in #345
- Bump v0.3.3 by @garrett4wade in #349
New Contributors
- @CormickKneey made their first contribution in #308
- @rchardx made their first contribution in #309
- @jwhj made their first contribution in #325
- @lowdy1 made their first contribution in #327
Full Changelog: v0.3.2...v0.3.3
v0.3.2
Highlights
-
Enhanced Documentation with Best Practices: We've expanded our documentation to include essential best practices for debugging agents and algorithms in isolation, plus guidance on handling OOM errors using Ulysses sequence parallelism. Explore the updated docs to get the most out of your workflows!
-
Intuitive Allocation Mode Construction: We're excited to introduce a new approach to building allocation modes that's both incredibly intuitive and highly expressive. This foundation will enable you to specify complex parallel strategies for Megatron training, with full stable Megatron support coming in an upcoming release.
What's Changed
- [Doc] add best practices doc, including debugging and handling OOM by @fishcrap in #287
- doc: fix important and note section format by @garrett4wade in #288
- [Bug Fix] Fix bugs in FSDP ulysses sequence parallel and megatron engine. by @nuzant in #292
- bump to v0.3.2 by @garrett4wade in #293
New Contributors
Full Changelog: v0.3.1...v0.3.2
v0.3.1
Release Note
AReaL has been refactored from the legacy realhf codebase to the new areal codebase. These two directories are now independent, and our future development will focus primarily on the lightweight areal directory.
Major changes in v0.3.1 for the areal directory:
- Added support for RL with Megatron 5D parallelism based on Megatron Core 0.13.1. We can now fine-tune large MoE models with AReaL. We also optimized weight loading and saving of Megatron models to the Hugging Face format, achieving 20x and 5x speedup for loading and saving respectively.
- Added support for writing agentic RL workflows with the OpenAI client. Writing agentic RL is as easy as writing a standard agent!
- Added support for Ulysses sequence parallelism with FSDP to reduce peak memory usage.
- Added support for dp-attention, cross-node TP, and expert parallelism with SGLang.
- Added support for automatic failover.
- Created Jupyter notebook tutorials.
What's Changed
- Warning in the doc: How to run a synchronous configuration asynchronously. by @xssstory in #222
- chore: update stream notebook, issue template, and contribution guide by @garrett4wade in #227
- [Feature] [refactor] Slightly refactor inference engine IO data structures by @garrett4wade in #230
- [doc] Update gsm8k example hyperparameters by @EnderXie23 in #228
- [fix] Fix the unit-tests hanging bug by terminating the rollout thread by @garrett4wade in #231
- ci: migrate to isolated runners by @futrime in #229
- [feature] support fault recovery and rollout-only evaluation by @garrett4wade in #234
- [fix] Fix a minor iteration logic when using group_adv_norm by @garrett4wade in #225
- [fix] Fix remote name error in gsm8k_grpo.yaml by @EnderXie23 in #235
- test(areal): SFT integration tests by @futrime in #233
- Update requirements.txt by @garrett4wade in #237
- test(grpo): add GRPO integration tests by @futrime in #239
- [fix] Fix the name mismatch error of NCCL weight update for VLM and timeout error for computing rewards. by @garrett4wade in #236
- [Fix] Fix the launcher errors of Ray/SLURM by @garrett4wade in #242
- chore: add autoflake CI by @garrett4wade in #245
- [fix] Add verbose messages for CI pytest by @garrett4wade in #247
- [feat] Fix rollout completion order and allow stats logging during workflow execution. by @garrett4wade in #246
- feat: support openai-compatible rollout and add an unittest for prepare_mb_list by @garrett4wade in #248
- refactor: remove areal's dependency on realhf by @garrett4wade in #249
- [FEAT] Support Variable Shape of Multi-Modal Inputs for VLM Training by @JamesKrW in #244
- [fix] remove the dependency of realhf in areal by @garrett4wade in #252
- doc: add for writing workflows with the openai-compatible client by @garrett4wade in #254
- chore: highlight wechat in readme by @garrett4wade in #255
- [experimental] [feat] add megatron checkpointer and accelerate megatron weight loading by @garrett4wade in #253
- fix: fix incorrect imports from realhf that causes statistics naming error by @garrett4wade in #262
- [experimental] feat: megatron 5d parallel forward, reliable reward process executor, max length of dataset by @garrett4wade in #263
- add search agent jupyter notebook example by @samjia2000 in #264
- fix: fix ci unit-test after gh runner recovers by @garrett4wade in #268
- fix: replace
LLMRequestwithModelRequestin the notebook by @ZiyiTsang in #271 - fix: replace
LLMRequestwithModelRequestand format the asearcher notebook by @garrett4wade in #272 - fix: fix the parsing logic of LLM_SERVER_ONLY allocation mode by @GurrenLagann97 in #265
- feat: support sglang cross-node TP and dp-attention with slurm by @garrett4wade in #274
- [Doc] Fix a typo in a figure by @nuzant in #276
- feat: add megatron SFT example by @garrett4wade in #275
- feat: support sglang cross-node ep and dp_attn with all launchers by @garrett4wade in #277
- feat: support ulysses sequence parallel for FSDP by @garrett4wade in #278
- fix: revert setting context parallel size when using ulysses by @garrett4wade in #279
- feat: add a megatron grpo example by @garrett4wade in #281
- Bump to v0.3.1 by @garrett4wade in #283
New Contributors
- @EnderXie23 made their first contribution in #228
- @JamesKrW made their first contribution in #244
- @ZiyiTsang made their first contribution in #271
Full Changelog: v0.3.0-lite.post2...v0.3.1
AReaL-lite post2
What's Changed
- [fix] Fix the NCCL parameter synchronization bug with slurm and add more logging messages in workflow executor. by @garrett4wade in #215
- [fix] Fix a slurm launcher issue by @garrett4wade in #219
- [CRITICAL] Fix forward bug of LLM and VLM by @antoinegg1 in #218
Full Changelog: v0.3.0-lite.post1...v0.3.0-lite.post2