This file is the Claude entrypoint for repository guidance.
For the canonical agent instructions, repository orientation, and workflow expectations, see AGENTS.md.
For full contribution flow, code style, and PR process, see CONTRIBUTING.md.
Keeping the detailed guidance in AGENTS.md avoids duplication and prevents the two files from drifting out of sync.
Single machine: Install via Docker or bash requirements/install.sh embodied --model <model> --env <env> (set REPO_PATH and any asset paths). Ray may auto-start; or run ray start --head. Use a config with cluster.num_nodes: 1 (e.g. from examples/embodiment/config/). Launch with bash examples/embodiment/run_embodiment.sh <config_name> or python examples/embodiment/train_embodied_agent.py --config-name <config_name>, and set env vars the example needs (e.g. MUJOCO_GL=egl, ROBOT_PLATFORM).
Multiple machines: On each node, before ray start: set export RLINF_NODE_RANK=<0..N-1> (unique) and optionally RLINF_COMM_NET_DEVICES. Head: ray start --head --port=6379 --node-ip-address=<head_ip>. Workers: ray start --address=<head_ip>:6379. You can use ray_utils/start_ray.sh. Set cluster.num_nodes to the total; optionally use node_groups and component_placement (see rlinf/scheduler/cluster/config.py and the heterogeneous cluster tutorial). Run the entry script only on the head; it attaches to the existing Ray cluster and schedules workers by placement.
- Placement and throughput: Configure
cluster.component_placement(collocated vs disaggregated vs hybrid, node groups, hardware ranks). See placement tutorial and execution modes. - OOM: Tune env (
total_num_envs,group_size), rollout (batch/seq,gpu_memory_utilization,enable_offload), actor (micro_batch_size,global_batch_size,gradient_checkpointing,enable_offload). Example configs inexamples/embodiment/config/. See FAQ for SGLang/memory issues. - Multi-node and hetero: Set
cluster.num_nodes; setRLINF_NODE_RANK(and optionallyRLINF_COMM_NET_DEVICES) beforeray starton each node—Ray captures env at start time. Optionalnode_groupsandcomponent_placementin YAML;env_configs(e.g.env_vars,python_interpreter_path) are applied at worker allocation. See heterogeneous cluster andrlinf/scheduler/cluster/config.py.
- Metrics: Runners use
MetricLogger; setrunner.logger.logger_backends(e.g. tensorboard, wandb, swanlab). Namespaces includetrain/,eval/,env/,rollout/,time/. See logger tutorial. - Checkpoints: Saved every
runner.save_intervalunder.../checkpoints/global_step_<N>/. To resume, setrunner.resume_dirto that path and relaunch; some runners supportresume_dir: auto. See checkpoint resume tutorial. - Evaluation: During training,
runner.val_check_intervaltriggers validation. Standalone embodied:bash examples/embodiment/eval_embodiment.sh <config_name>with an eval config; see VLA evaluation. Reasoning/LLM: see LLM evaluation.
For debugging (breakpoints, rendering/EGL, network, NCCL/CUDA, timeouts), see the FAQ in Further reading.
Config (rlinf/config.py): build_config / validate_cfg produce the full DictConfig. New model or env types go into SupportedModel / SupportedEnvType and validation.
Cluster and placement: ClusterConfig and strategies in rlinf/scheduler/placement/, rlinf/utils/placement.py. Placement controls where actor/rollout/env run (one node vs many, GPU vs CPU, heterogeneous).
Algorithms: Advantage and loss functions are registered in rlinf/algorithms/ (registry + decorators); rewards are registered in rlinf/algorithms/rewards/. Config keys algorithm.adv_type and algorithm.loss_type select them. See Extending RLinf: algorithms, models, envs for step-by-step instructions.
Models (embodied): Register in SupportedModel in config.py, implement under rlinf/models/embodiment/<name>/ (e.g. BasePolicy), wire in config and workers. Use add-install-docker-ci-e2e for install/Docker/CI. Details in the extension section below.
Environments: Register in SupportedEnvType and get_env_cls() in rlinf/envs/__init__.py, implement under rlinf/envs/<name>/. Use add-install-docker-ci-e2e and add-example-doc-model-env for install and docs. Details below.
Workers: Subclass Worker, implement initialize and your API, launch with create_group(...).launch(...). Use self.log_info / log_warning / log_error; no print.
Runners: They own the training loop. New task type = new runner + entry script that builds Cluster, placement, worker groups, and calls the runner.
Advantage function
- Implement a function that takes the same keyword args as existing ones (e.g.
rewards,values,dones,gamma,loss_mask, …) and returns(advantages, returns). Seerlinf/algorithms/advantages.py(e.g.compute_gae_advantages_and_returns) for signatures. - Register it:
from rlinf.algorithms.registry import register_advantagethen@register_advantage("my_adv")on your function. The name is case-normalized to lowercase. - In config YAML set
algorithm.adv_type: my_adv. Actor workers callcalculate_adv_and_returns(adv_type=...)which dispatches viaget_adv_and_returns(name). - For non-GAE styles (e.g. GRPO, Reinforce++),
rlinf/algorithms/utils.pymay need to compute scores first; check howadv_typeis used incalculate_adv_and_returnsand in the actor worker.
Policy loss
- Implement a function that accepts the kwargs passed by the actor (e.g.
logprobs,old_logprobs,advantages,clip_ratio_low,clip_ratio_high,loss_mask, …) and returns(loss_tensor, metrics_dict). Seerlinf/algorithms/losses.py(e.g.compute_ppo_actor_loss,compute_ppo_actor_critic_loss,compute_grpo_actor_loss_fn). - Register:
from rlinf.algorithms.registry import register_policy_lossthen@register_policy_loss("my_loss"). - In config set
algorithm.loss_type: my_loss. For PPO-style actor+critic you need a critic and value loss; the unified entry ispolicy_loss(loss_type=..., **kwargs)inregistry.py. Add validation inrlinf/config.pyif your loss has special requirements (e.g.validate_cfgalready checksloss_type == "actor_critic"for value head).
Reward
- Add a reward class (e.g. under
rlinf/algorithms/rewards/<domain>/) that matches the interface expected by the reward worker (e.g. callable or class with a clear contract for prompt/completions/ids). - In
rlinf/algorithms/rewards/__init__.py: import the class, thenregister_reward("my_reward", MyRewardClass). The registry isreward_registry; lookup viaget_reward_class(name). - Wire the reward name in config and in the runner/reward worker so the correct class is instantiated and used. For reasoning/agent tasks the config path may be under
reward.pathor similar.
- Registration: In
rlinf/config.py, add a new value to theSupportedModelenum:MY_MODEL = ("my_model", "embodied"). Useget_supported_model(model_type)in validation somodel.model_type: my_modelis accepted. - Implementation: Create a package under
rlinf/models/embodiment/my_model/. For policies that fit the embodied actor interface, inherit fromrlinf.models.embodiment.base_policy.BasePolicyand implementdefault_forwardandpredict_action_batch; add other forward types (e.g.sac_forward,crossq_forward) if the algorithm needs them. For HuggingFace-based VLAs, follow the pattern in the docs: register config and processor inrlinf/models/__init__.py(get_model_config_and_processor), then implement an action model that wraps generation and optional value head. - Config and workers: Ensure
build_config/ default configs provide the rightmodel.model_type, checkpoint paths, and any model-specific options. Actor and rollout workers already branch oncfg.actor.model.model_type/cfg.rollout.model.model_type; add branches or a factory so your model is instantiated and used. For FSDP+HuggingFace, see the new model (FSDP) tutorial; for Megatron there is a separate new model (Megatron) tutorial. - Install and CI: If the model needs extra deps or a dedicated venv, add it to
requirements/install.sh(e.g.SUPPORTED_MODELS, and aninstall_my_model()or branch in the model switch). For Docker and e2e: use the skill.cursor/skills/add-install-docker-ci-e2e(install script, Dockerfile stage, CI job, e2e config undertests/e2e_tests/embodied/).
- Registration: In
rlinf/envs/__init__.py, add a member toSupportedEnvType: e.g.MY_ENV = "my_env". Inget_env_cls(env_type, env_cfg=None, ...)add anelif env_type == SupportedEnvType.MY_ENV:branch that imports your env class and returns it (lazy import to avoid loading heavy deps at import time). If the env needs a task id (like IsaacLab), useenv_cfgand document the expected shape. - Implementation: Create
rlinf/envs/my_env/with at least one module defining a gym-style env (e.g.gymnasium.Env):reset,step, and the usual attributes (observation_space,action_space). Follow the new environment tutorial for the expected structure (e.g. vectorizednum_envs,group_size,ret_device). If your env uses custom action formatting, add a branch inrlinf/envs/action_utils.pyinprepare_actions(env_type, ...)so rollout/workers pass correctly shaped actions. - Config: Set
env.train.env_typeandenv.eval.env_typeto the string value of your enum (e.g.my_env). Add any env-specific defaults or validation inrlinf/config.py(e.g.validate_cfgalready has env-specific checks for ManiSkill, Behavior, etc.; add similar ones if needed). - Install and docs: For install/Docker/CI, use
.cursor/skills/add-install-docker-ci-e2e(add env toSUPPORTED_ENVS, install logic, e2e config). For example docs and RST, use.cursor/skills/add-example-doc-model-env.
Google Python style; Ruff for lint/format; docstrings and type hints on public APIs. Logging: rlinf.utils.logging.get_logger() or Workers’ self.log_*. Config YAML: static values only; no computed fields; don’t overwrite user-facing fields in code. Commits: Conventional Commits, ~72-char subject, imperative; every commit Signed-off-by: (e.g. git commit -s). PRs: same title format, fill template, link issues; for perf-sensitive changes include test results. New behavior needs tests (unit or e2e); if e2e needs GPUs/hardware, document and skip appropriately in CI. Full details: CONTRIBUTING.md.
- Docs (EN) · 中文
- Installation · VLA quickstart
- Example gallery · configs in
examples/embodiment/config/,examples/reasoning/, etc. - Tutorials: placement / cluster / YAML, hybrid / disaggregated, heterogeneous cluster, extend (new env/model), RL algorithms, logger (metrics), checkpoint resume
- Evaluation: VLA evaluation · LLM evaluation
- APIs (actor, channel, cluster, placement, worker, env, data, …) · FAQ