TorchRL v0.12.0

@vmoens

TorchRL v0.12.0 Release Notes

Highlights

New algorithms. Five new config-based trainers — DQN, DDPG, IQL, CQL, and TD3 — are built on a new configuration system for reproducible algorithm setups (@vmoens, @bsprenger). PILCO (Probabilistic Inference for Learning Control) is now available as a built-in algorithm (@PSXBRosa, @vmoens). For diffusion-based behavioral cloning, a new DDPMModule diffusion actor and DiffusionBCLoss are included (@theap06). Async PPO infrastructure overlaps data collection and optimization (@vmoens).
Collector and data-flow improvements. A new high-throughput auto-batching inference server automatically batches requests from multiple environments, with pluggable transport backends (threading, multiprocessing, Ray, Monarch) and built-in weight-sync integration. Paired with the new AsyncBatchedCollector, it enables asynchronous data collection with automatic batching for maximum GPU utilization (@vmoens). The new TrajectoryBatcher and AsyncTrajectoryBatcher assemble trajectories efficiently from streaming environment transitions, including variable-length trajectories and padding (@theap06). On the parallel environment side, shared-memory done flags replace mp.Event for lower-latency step synchronization, and a fast-path device-transfer optimization reduces overhead in step_and_maybe_reset (@vmoens).
Inference backends. This release adds full SGLang integration alongside vLLM, with an SGLangWrapper policy module, an AsyncSGLang server-based inference path, NCCL weight synchronization, and GRPO support (@vmoens).
Replay buffer. StoreStorage is a new Redis/Dragonfly-backed storage backend that lets replay buffers share experience across processes and nodes (@vmoens).
Evaluation. A new Evaluator class provides a unified API for synchronous and asynchronous policy evaluation during training, with a process backend, collector-based stepping, weight sync via WeightSyncScheme, multi-model support, and a RayEvalWorker for distributed evaluation (@vmoens).
Environments and platform support. A new GenesisEnv wrapper integrates the Genesis physics simulator (@ParamThakkar123). Dreamer now supports pre-vectorized environments and ships with an IsaacLab environment factory, training script, and integration guide (@vmoens). MPS support improves through float64-to-float32 downcasting in ParallelEnv, SerialEnv, and collectors, fixing previously broken Apple Silicon GPU workflows (@bsprenger).

Installation

pip install torchrl==0.12.0

Requires PyTorch >= 2.1 and TensorDict >= 0.12.0.

Breaking Changes

Remove v0.12 deprecated APIs (#3670) @vmoens
- The local_init_rb parameter has been removed from Collector and MultiCollector. Storage-level initialization is now the only behavior.
- TransformedEnv(env=...) now raises TypeError. Use TransformedEnv(base_env=...) instead.

New Features

Auto-batching Inference Server

A new inference server that automatically batches requests from multiple environments for efficient GPU inference. This is a key building block for scaling RL training with many parallel environments.

Core server and transport protocol (#3492)
Threading transport (#3493)
Multiprocessing transport (#3494)
Ray transport (#3495)
Monarch transport (#3496)
Weight sync integration (#3497)

`AsyncBatchedCollector`

A new collector that combines async environments with the auto-batching inference server for maximum throughput.

Async envs + auto-batching inference (#3498)
Coordinator loop and direct submission mode (#3499)
Backend params and performance optimizations (#3511)

Trajectory Batcher

TrajectoryBatcher for assembling trajectories from streaming transitions (#3584) @theap06
AsyncTrajectoryBatcher for asynchronous trajectory assembly (#3592) @theap06

SGLang Backend

Full SGLang support for LLM inference, mirroring the existing vLLM integration:

Base infrastructure (#3428)
AsyncSGLang server-based inference service (#3429)
SGLangWrapper policy module (#3430)
NCCL weight synchronization (#3431)
Module structure integration (#3432)
SGLang backend support in GRPO

Diffusion Policies

DDPMModule diffusion actor for denoising diffusion probabilistic models (#3596) @theap06
DiffusionBCLoss for diffusion-based behavioral cloning (#3604) @theap06

`Evaluator`

Evaluator class for sync/async evaluation (#3594)
Process backend, lazy init, and pending property (#3611)
Collector-based stepping backend (#3624)
Enable loggers to run as Ray actors (#3623)
Weight sync via WeightSyncScheme + multi-model support (#3627)
Isaac Lab Evaluator tests + init_fn plumbing for process backend (#3663)
RayEvalWorker for distributed async evaluation (#3474)
Named actors and from_name for RayEvalWorker (#3488)

Async PPO

Async PPO infrastructure for overlapping collection and optimization (#3661)

Config-based Trainers

New trainers with integrated configuration system:

DQN Trainer (#3526)
DDPG Trainer (#3527)
IQL Trainer (#3528)
CQL Trainer (#3529)
TD3 Trainer (#3557) @bsprenger
Hook point to log average optimization losses in trainers (#3666)

Replay Buffer

StoreStorage for Redis/Dragonfly-backed replay buffers (#3516)
set_at_, set_, update_ methods on ReplayBuffer (#3590) @jashshah999
Support trajs_per_batch with replay_buffer on multi-process and distributed collectors (#3618)

LLM / GRPO

Token-in, token-out LLM wrapper mode (#3407)
GRPO improvements: new envs, vLLM V1 compat, log-prob fixes, training stability (#3580)
Namespace GRPO wandb metrics for auto-grouping (#3585)
Remove placement-group xfails and fix vLLM tokenizer compat (#3586)

Environments

GenesisEnv: wrapper for the Genesis physics simulator (#3536) @ParamThakkar123
FinancialRegimeEnv: a vectorized financial environment (#3384) @aneesh223
num_workers parameter for HabitatEnv (#3383) @ParamThakkar123
Dreamer: support pre-vectorized environments (#3483)
Dreamer: add IsaacLab environment factory (#3484)

Transforms

Inverse for VecNorm and VecNormV2 transforms (#3416) @ParamThakkar123
prevent_leaking_rng utility (#3401) @ParamThakkar123

Logging

log_metrics method for efficient batch logging (#3452)
TensorDict support in log_metrics (#3455)

Specs

index_select support for TensorSpec (#3406) @ParamThakkar123
strict_shape parameter for QValueModule action shape enforcement (#3593) @Lidang-Jiang

Algorithms

PILCO (Probabilistic Inference for Learning Control) (#3582) @PSXBRosa, @vmoens

Collectors

Lazy-init RandomPolicy action_spec from env in collectors (#3664)

Other

__getattr__ in _dispatch_caller_parallel for transparent attribute access (#3389) @ParamThakkar123
scalar_output_mode for loss modules with reduction='none' (#3426)
ObsDecoder: out_channels parameter for grayscale decoding (#3472)
Ergonomic scalar assignment for loss buffers (#3612)
New memmap value for the CKPT_BACKEND environment variable (#3619) @theap06

Performance Improvements

GPU Image Transforms for Dreamer (~5.5x faster sampling)
SliceSampler: GPU-accelerated trajectory computation
Always enable prefetch for replay buffer
ParallelEnv: fast-path device transfer in step_and_maybe_reset
ParallelEnv: replace mp.Event with shared-memory done flags for lower latency
Lazy stack optimization for collector-to-buffer writes (#3438)
log_metrics usage in sota-implementations (#3454)

Bug Fixes

MPS (Apple Silicon)

Downcast float64 to float32 in ParallelEnv/SerialEnv on MPS (#3551) @bsprenger
MPS float64->float32 downcast for tensors (#3548) @bsprenger
Fix masked_scatter shape preservation on MPS in collectors (#3473)

Collectors

Fix stale model reference in MultiCollector weight sync after device-cast (#3587)
Fix shared mem updater with many policies (#3442)
Fix missing raise, incorrect __torch_function__ return, and off-by-one in RayCollector (#3530) @jashshah999

Environments

Fix check_env_specs() when state_spec contains keys not in observation_spec (#3581) @theap06
Fix BraxEnv rejecting camera_id and render_kwargs (#3533)
StepCounter now tracks nested truncated and done states (#3405)
Fix ParallelEnv shutdown hang with shared-memory done flags (#3464)

Loss / Models

Fix broken SACLoss when there is more than one qvalue_network (#3500) @ParamThakkar123
Fix GPT2RewardModel.compute_reward_loss (#3521)
Fix resnet order call in ImpalaNet (#3522)
Fix vLLM CompilationConfig compatibility and Windows CI pybind11 (#3673)

Specs / TensorDict

Fix MultiOneHot.to_numpy() returning scalar instead of array (#3589) @jashshah999
Fix shape mismatch in _set_index_in_td with trailing dims of 1 (#3517)
Set batch size in Composite.encode (#3411) @tobiabir

Transforms

LineariseRewards should not squeeze trailing dim (#3614) @mathieuorhan
Allow gradient flow through R3MTransform in training mode (#3607) @theap06

Other

Fix DataLoadingPrimer batch_size detection for NonTensorStack (#3532)
Fix compiled storage access (#3547)
Fix CUDA graph capture for Bounded spec projection (#3453)
Fix VideoRecorder support for grayscale (1-channel) observations (#3471)
Fix functools.partial warnings (#3465) @ParamThakkar123
Fix none handling in pendulum.py tutorial (#3595) @theap06
Fix StepCounter._reset should not use output_spec (#3626)
Fix per-group WandB step logging (#3625)

Refactors

Refactor optimization API for multi-phase optimization (#3468) @bsprenger
Upgrade to torchcodec for video export (#3540)
Refactor NoisyLinear (#3082)
Upgrade meshgrid usage to address deprecation warning (#3412)

Documentation

Use new canonical collector names across docs, tutorials, and SOTA (#3665)
Tutorial on collector trajectory assembly internals (#3600) @coder-jayp
IsaacLab integration guide and setup script (#3486)
RayEvalWorker API reference docs (#3487)
SGLang backend documentation
TransformersWrapper/ChatEnv integration documentation (#3377)
EGL multi-GPU limitations in containers (#3456)

New Contributors

Welcome to the following first-time contributors!

@aneesh223 - FinancialRegimeEnv
@coder-jayp - Collector trajectory assembly tutorial
@jashshah999 - RayCollector fixes, ReplayBuffer API additions, MultiOneHot fix
@Lidang-Jiang - QValueModule strict_shape parameter
@mathieuorhan - LineariseRewards fix
@theap06 - Diffusion policies, trajectory batcher, R3MTransform fix, CKPT_BACKEND memmap
@thecaptain789 - Typo fix
@tobiabir - Composite.encode batch size fix

Full Changelog

v0.11.0...v0.12.0

TorchRL v0.12.0

TorchRL v0.12.0 Release Notes

Highlights

Installation

Breaking Changes

New Features

Auto-batching Inference Server

AsyncBatchedCollector

Trajectory Batcher

SGLang Backend

Diffusion Policies

Evaluator

Async PPO

Config-based Trainers

Replay Buffer

LLM / GRPO

Environments

Transforms

Logging

Specs

Algorithms

Collectors

Other

Performance Improvements

Bug Fixes

MPS (Apple Silicon)

Collectors

Environments

Loss / Models

Specs / TensorDict

Transforms

Other

Refactors

Documentation

New Contributors

Full Changelog

Contributors

Uh oh!

`AsyncBatchedCollector`

`Evaluator`