Skip to content

TorchRL v0.12.0

Latest

Choose a tag to compare

@vmoens vmoens released this 27 Apr 17:23
· 73 commits to main since this release
2b2bac7

TorchRL v0.12.0 Release Notes

Highlights

  • New algorithms. Five new config-based trainers — DQN, DDPG, IQL, CQL, and TD3 — are built on a new configuration system for reproducible algorithm setups (@vmoens, @bsprenger). PILCO (Probabilistic Inference for Learning Control) is now available as a built-in algorithm (@PSXBRosa, @vmoens). For diffusion-based behavioral cloning, a new DDPMModule diffusion actor and DiffusionBCLoss are included (@theap06). Async PPO infrastructure overlaps data collection and optimization (@vmoens).

  • Collector and data-flow improvements. A new high-throughput auto-batching inference server automatically batches requests from multiple environments, with pluggable transport backends (threading, multiprocessing, Ray, Monarch) and built-in weight-sync integration. Paired with the new AsyncBatchedCollector, it enables asynchronous data collection with automatic batching for maximum GPU utilization (@vmoens). The new TrajectoryBatcher and AsyncTrajectoryBatcher assemble trajectories efficiently from streaming environment transitions, including variable-length trajectories and padding (@theap06). On the parallel environment side, shared-memory done flags replace mp.Event for lower-latency step synchronization, and a fast-path device-transfer optimization reduces overhead in step_and_maybe_reset (@vmoens).

  • Inference backends. This release adds full SGLang integration alongside vLLM, with an SGLangWrapper policy module, an AsyncSGLang server-based inference path, NCCL weight synchronization, and GRPO support (@vmoens).

  • Replay buffer. StoreStorage is a new Redis/Dragonfly-backed storage backend that lets replay buffers share experience across processes and nodes (@vmoens).

  • Evaluation. A new Evaluator class provides a unified API for synchronous and asynchronous policy evaluation during training, with a process backend, collector-based stepping, weight sync via WeightSyncScheme, multi-model support, and a RayEvalWorker for distributed evaluation (@vmoens).

  • Environments and platform support. A new GenesisEnv wrapper integrates the Genesis physics simulator (@ParamThakkar123). Dreamer now supports pre-vectorized environments and ships with an IsaacLab environment factory, training script, and integration guide (@vmoens). MPS support improves through float64-to-float32 downcasting in ParallelEnv, SerialEnv, and collectors, fixing previously broken Apple Silicon GPU workflows (@bsprenger).

Installation

pip install torchrl==0.12.0

Requires PyTorch >= 2.1 and TensorDict >= 0.12.0.


Breaking Changes

  • Remove v0.12 deprecated APIs (#3670) @vmoens
    • The local_init_rb parameter has been removed from Collector and MultiCollector. Storage-level initialization is now the only behavior.
    • TransformedEnv(env=...) now raises TypeError. Use TransformedEnv(base_env=...) instead.

New Features

Auto-batching Inference Server

A new inference server that automatically batches requests from multiple environments for efficient GPU inference. This is a key building block for scaling RL training with many parallel environments.

  • Core server and transport protocol (#3492)
  • Threading transport (#3493)
  • Multiprocessing transport (#3494)
  • Ray transport (#3495)
  • Monarch transport (#3496)
  • Weight sync integration (#3497)

AsyncBatchedCollector

A new collector that combines async environments with the auto-batching inference server for maximum throughput.

  • Async envs + auto-batching inference (#3498)
  • Coordinator loop and direct submission mode (#3499)
  • Backend params and performance optimizations (#3511)

Trajectory Batcher

  • TrajectoryBatcher for assembling trajectories from streaming transitions (#3584) @theap06
  • AsyncTrajectoryBatcher for asynchronous trajectory assembly (#3592) @theap06

SGLang Backend

Full SGLang support for LLM inference, mirroring the existing vLLM integration:

  • Base infrastructure (#3428)
  • AsyncSGLang server-based inference service (#3429)
  • SGLangWrapper policy module (#3430)
  • NCCL weight synchronization (#3431)
  • Module structure integration (#3432)
  • SGLang backend support in GRPO

Diffusion Policies

  • DDPMModule diffusion actor for denoising diffusion probabilistic models (#3596) @theap06
  • DiffusionBCLoss for diffusion-based behavioral cloning (#3604) @theap06

Evaluator

  • Evaluator class for sync/async evaluation (#3594)
  • Process backend, lazy init, and pending property (#3611)
  • Collector-based stepping backend (#3624)
  • Enable loggers to run as Ray actors (#3623)
  • Weight sync via WeightSyncScheme + multi-model support (#3627)
  • Isaac Lab Evaluator tests + init_fn plumbing for process backend (#3663)
  • RayEvalWorker for distributed async evaluation (#3474)
  • Named actors and from_name for RayEvalWorker (#3488)

Async PPO

  • Async PPO infrastructure for overlapping collection and optimization (#3661)

Config-based Trainers

New trainers with integrated configuration system:

Replay Buffer

  • StoreStorage for Redis/Dragonfly-backed replay buffers (#3516)
  • set_at_, set_, update_ methods on ReplayBuffer (#3590) @jashshah999
  • Support trajs_per_batch with replay_buffer on multi-process and distributed collectors (#3618)

LLM / GRPO

  • Token-in, token-out LLM wrapper mode (#3407)
  • GRPO improvements: new envs, vLLM V1 compat, log-prob fixes, training stability (#3580)
  • Namespace GRPO wandb metrics for auto-grouping (#3585)
  • Remove placement-group xfails and fix vLLM tokenizer compat (#3586)

Environments

Transforms

Logging

  • log_metrics method for efficient batch logging (#3452)
  • TensorDict support in log_metrics (#3455)

Specs

Algorithms

Collectors

  • Lazy-init RandomPolicy action_spec from env in collectors (#3664)

Other

  • __getattr__ in _dispatch_caller_parallel for transparent attribute access (#3389) @ParamThakkar123
  • scalar_output_mode for loss modules with reduction='none' (#3426)
  • ObsDecoder: out_channels parameter for grayscale decoding (#3472)
  • Ergonomic scalar assignment for loss buffers (#3612)
  • New memmap value for the CKPT_BACKEND environment variable (#3619) @theap06

Performance Improvements

  • GPU Image Transforms for Dreamer (~5.5x faster sampling)
  • SliceSampler: GPU-accelerated trajectory computation
  • Always enable prefetch for replay buffer
  • ParallelEnv: fast-path device transfer in step_and_maybe_reset
  • ParallelEnv: replace mp.Event with shared-memory done flags for lower latency
  • Lazy stack optimization for collector-to-buffer writes (#3438)
  • log_metrics usage in sota-implementations (#3454)

Bug Fixes

MPS (Apple Silicon)

  • Downcast float64 to float32 in ParallelEnv/SerialEnv on MPS (#3551) @bsprenger
  • MPS float64->float32 downcast for tensors (#3548) @bsprenger
  • Fix masked_scatter shape preservation on MPS in collectors (#3473)

Collectors

  • Fix stale model reference in MultiCollector weight sync after device-cast (#3587)
  • Fix shared mem updater with many policies (#3442)
  • Fix missing raise, incorrect __torch_function__ return, and off-by-one in RayCollector (#3530) @jashshah999

Environments

  • Fix check_env_specs() when state_spec contains keys not in observation_spec (#3581) @theap06
  • Fix BraxEnv rejecting camera_id and render_kwargs (#3533)
  • StepCounter now tracks nested truncated and done states (#3405)
  • Fix ParallelEnv shutdown hang with shared-memory done flags (#3464)

Loss / Models

  • Fix broken SACLoss when there is more than one qvalue_network (#3500) @ParamThakkar123
  • Fix GPT2RewardModel.compute_reward_loss (#3521)
  • Fix resnet order call in ImpalaNet (#3522)
  • Fix vLLM CompilationConfig compatibility and Windows CI pybind11 (#3673)

Specs / TensorDict

  • Fix MultiOneHot.to_numpy() returning scalar instead of array (#3589) @jashshah999
  • Fix shape mismatch in _set_index_in_td with trailing dims of 1 (#3517)
  • Set batch size in Composite.encode (#3411) @tobiabir

Transforms

Other

  • Fix DataLoadingPrimer batch_size detection for NonTensorStack (#3532)
  • Fix compiled storage access (#3547)
  • Fix CUDA graph capture for Bounded spec projection (#3453)
  • Fix VideoRecorder support for grayscale (1-channel) observations (#3471)
  • Fix functools.partial warnings (#3465) @ParamThakkar123
  • Fix none handling in pendulum.py tutorial (#3595) @theap06
  • Fix StepCounter._reset should not use output_spec (#3626)
  • Fix per-group WandB step logging (#3625)

Refactors

  • Refactor optimization API for multi-phase optimization (#3468) @bsprenger
  • Upgrade to torchcodec for video export (#3540)
  • Refactor NoisyLinear (#3082)
  • Upgrade meshgrid usage to address deprecation warning (#3412)

Documentation

  • Use new canonical collector names across docs, tutorials, and SOTA (#3665)
  • Tutorial on collector trajectory assembly internals (#3600) @coder-jayp
  • IsaacLab integration guide and setup script (#3486)
  • RayEvalWorker API reference docs (#3487)
  • SGLang backend documentation
  • TransformersWrapper/ChatEnv integration documentation (#3377)
  • EGL multi-GPU limitations in containers (#3456)

New Contributors

Welcome to the following first-time contributors!


Full Changelog

v0.11.0...v0.12.0