FluxVLA Engine is a full-stack, end-to-end engineering platform for deploying embodied intelligence applications. Built on the core design principles of unified configuration, standardized interfaces, module decoupling, and deployability, it creates a complete engineering loop from data to real-device deployment. With the goal of providing a standardized industry–academia–research foundation, it significantly lowers the engineering barrier for VLA research and development.
| Codebase | Libero-Spatial | Libero-Object | Libero-Goal | Libero-Long | Libero-Average |
|---|---|---|---|---|---|
| FluxVLA(GR00T) | 96.2 | 96.8 | 93.4 | 89.4±1.5 | 93.95 |
| FluxVLA(Pi) | 98.6 | 99.0 | 97.8 | 96±1.0 | 97.85 |
| FluxVLA(Qwen3VL 0.6B+GR00T) | 98.6 | 99.6 | 95.6 | 92.2±1.8 | 96.50 |
| FluxVLA(DreamZero) | 96.8 | 97.4 | 90.8±1.5 | 93.6 | 94.65 |
| FluxVLA(SmolVLA) | 87.4 | 93.2 | 92.0 | 63.4 | 84.0 |
[2026/04/22] 🔥 ZMQ-based remote inference framework is now supported.
[2026/04/15] 🔥 DreamZero WAM is now supported.
[2026/04/08] 🔥 FluxVLA has been open-sourced.
The installation guide below uses NVCC 12.4 as an example. If your environment differs, adjust the CUDA version accordingly.
1. Create a conda environment
conda create -n fluxvla python=3.10 -y
conda activate fluxvla2. Install PyTorch (CUDA version)
Important: Before running
pip install -r requirements.txt, you must install PyTorch from the official CUDA index first. The default PyPI index cannot fetch CUDA-enabled builds.
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124For other CUDA versions, replace cu124 with the corresponding value (e.g., cu118, cu121). See: https://pytorch.org/get-started/locally/ .
3. Install flash-attention
Method 1: Install directly via pip:
pip install psutil ninja packaging
# MAX_JOBS controls the number of parallel build threads; tune it based on your machine resources
MAX_JOBS=8 pip install flash-attn==2.5.5 --no-build-isolation --find-links https://github.com/Dao-AILab/flash-attention/releasesMethod 2: Build from source (recommended if method 1 fails):
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.5.5
# MAX_JOBS controls the number of parallel build threads; tune it based on your machine resources
MAX_JOBS=8 python setup.py install4. Install av
conda install -c conda-forge av=14.4.05. Install fluxvla and other dependencies
pip install -r requirements.txt
pip install --no-build-isolation -e .Note:
requirements.txtpinstorch==2.6.0to prevent pip from accidentally replacing the CUDA-enabled PyTorch installed in step 2. If you need to use another torch version, update both the step-2 command and the torch version inrequirements.txt.
Online evaluation environment (LIBERO / EGL)
If you want to evaluate LIBERO on devices that do not support ray tracing (e.g., A100), please refer to EGL Device GPU Rendering Configuration.
Install system dependencies
export MUJOCO_GL=egl
sudo apt install libegl-dev libgl1-mesa-dev libx11-dev libglew-dev libosmesa6-devEnvironment checks
Make sure /proc/1/environ contains the following environment variables:
NVIDIA_DRIVER_CAPABILITIES=allNVARCH=x86_64NVIDIA_REQUIRE_CUDA=cuda>=12.4brand=teslaanddriver>=470
Create an EGL configuration file
Create file /usr/share/glvnd/egl_vendor.d/10_nvidia.json with the following content:
{
"file_format_version": "1.0.0",
"ICD": {
"library_path": "libEGL_nvidia.so.0"
}
}Configure pre-commit hooks (optional but recommended)
To ensure code quality and consistency (especially for C++/CUDA code), install pre-commit hooks:
pip install pre-commit
pre-commit installThis will automatically check and format code before every commit.
Configure Weights & Biases (wandb)
Weights & Biases is used for experiment tracking and visualization. Configure it as follows:
- Install wandb (included in
requirements.txt):
pip install wandb- Log in to your wandb account:
wandb login- Set environment variables:
export WANDB_PROJECT=fluxvla # project name (default: fluxvla)
export WANDB_ENTITY=your-team-name # team name or username (default: None)
export WANDB_MODE=online # online, offline, or disabled (default: online)- If you want to disable wandb logging during training, set:
export WANDB_MODE=disabledNote: all wandb configuration is read from environment variables; no additional settings are needed in config files.
Configure TensorBoard (optional)
TensorBoard is supported as an optional logging backend for experiment metric visualization. Configure it as follows:
- Add
'tensorboard'toactive_trackersin your config file:
metric=dict(
type='VLAMetric',
active_trackers=('jsonl', 'wandb', 'tensorboard'),
...
)Alternatively, enable it via command line without modifying the config file:
--cfg-options 'runner.metric.active_trackers=[jsonl,wandb,tensorboard]'- After training, launch TensorBoard to view metrics:
tensorboard --logdir work_dirs/tensorboardNote: event files are saved to {work_dir}/tensorboard/{run_id}/ per run, enabling automatic comparison across experiments. If the TENSORBOARD_LOG_PATH environment variable is set, it will be used directly as the log directory.
Use the datasets we prepared directly
Download the required datasets and place them under ./datasets. Download only the datasets you need according to your configuration.
| Dataset | Download link |
|---|---|
| libero-object | limxdynamics/FluxVLAData/libero_object_no_noops_lerobotv2.1 |
| libero-spatial | limxdynamics/FluxVLAData/libero_spatial_no_noops_lerobotv2.1 |
| libero-10 | limxdynamics/FluxVLAData/libero_10_no_noops_lerobotv2.1 |
| libero-goal | limxdynamics/FluxVLAData/libero_goal_no_noops_lerobotv2.1 |
| modified_libero_rlds | openvla/modified_libero_rlds |
| RealRobot_AgileX_aloha | limxdynamics/FluxVLAData/RealRobot_AgileX_aloha_lerobot_v2 |
| RealRobot_UR3_Chem | limxdynamics/FluxVLAData/RealRobot_UR3_Chem_lerobot_v2 |
For example, download the libero-10 dataset:
huggingface-cli download limxdynamics/FluxVLAData --repo-type dataset --include "libero_10_no_noops_lerobotv2.1/*" --local-dir ./datasetsReplace libero_10_no_noops_lerobotv2.1 with the corresponding folder name of the dataset you want to download.
Private dataset directory structure
If you train with fluxvla on private datasets, you need to convert your raw data (e.g., HDF5 files collected by ALOHA robots) into the LeRobot Dataset v2.1 format. For a step-by-step conversion guide, see Data Conversion Guide.
The converted dataset should follow this directory structure:
├── data
│ └── chunk000
│ │ └── episode_000000.parquet
│ │ └── episode_000001.parquet
│ │ └── ... (more parquet files)
│ │ └── episode_00000N.parquet
│ └── chunk001
│ └── ... (more chunks)
│ └── chunk00N
├── meta
│ └── episodes.jsonl
│ └── episodes_stats.jsonl
│ └── info.json
│ └── tasks.jsonl
├── videos
│ └── chunk000
│ │ └── camera name 0
│ │ │ └── episode_000000.mp4
│ │ │ └── episode_000001.mp4
│ │ │ └── ...(more mp4 files)
│ │ │ └── episode_00000N.mp4
│ │ └── camera name 1
│ └── chunk001
│ └── ... (more chunks)
│ └── chunk00N
Download the required pretrained checkpoints and place them under ./checkpoints. Download only the checkpoints you need based on your configuration.
VLA models
| Model | Size | Download link |
|---|---|---|
| GR00T N1.5 | 3B | 🤗 Hugging Face |
| OpenVLA | 7B | 🤗 Hugging Face |
| PI0_base | 3B | 🤗 Hugging Face |
| PI05_base | 3B | 🤗 Hugging Face |
| PI05_libero | 3B | 🤗 Hugging Face |
| SmolVLA | 450M | 🤗 Hugging Face |
Vision-Language Models (VLM)
| Model | Size | Download link |
|---|---|---|
| Qwen2.5-VL | 3B | 🤗 Hugging Face |
| SmolVLM2 | 500M | 🤗 Hugging Face |
Large Language Models (LLM)
| Model | Size | Download link |
|---|---|---|
| Qwen 2.5 | 3B | 🤗 Hugging Face |
| Qwen 2.5 | 7B | 🤗 Hugging Face |
| Llama 2 | 7B | 🤗 Hugging Face |
Vision backbone networks
| Model | Download link |
|---|---|
| ViT-Large (DINOv2) | 🤗 Hugging Face |
| ViT-SO400M (SigLIP) | 🤗 Hugging Face |
| SigLIP2 | 🤗 Hugging Face |
| paligemma | 🤗 Hugging Face |
Tip: You can speed up downloads with
huggingface-cli download <model-name> --local-dir ./checkpoints/<model-name>.
Trained models
You can also download models that have been trained with FluxVLA for inference or evaluation directly. Place them under ./work_dirs.
| Model | Download link |
|---|---|
| PI0.5 PaliGemma Libero-10 | 🤗 Hugging Face |
| GR00T Eagle 3B Libero-10 | 🤗 Hugging Face |
# Example: download the PI0.5 checkpoint from limxdynamics/FluxVLAEngine
huggingface-cli download limxdynamics/FluxVLAEngine --include "pi05_paligemma_libero_10_full_finetune_bs64/*" --local-dir ./checkpoints/pi05_paligemma_libero_10_full_finetune_bs64All-in-one: One configuration file manages the full workflow
- Manage key parameters for data, models, training, evaluation, inference, and deployment through a single config file (easier to reproduce and deploy).
Supports different VLA models
- Supports OpenVLA, LlavaVLA, Gr00t, Pi0, and Pi0.5.
Supports different modules
- Supports Llama, Gemma, and Qwen-family LLM backbones.
- Supports DINOv2 and SigLIP vision backbones.
- Supports PaliGemma and Qwen-VL VLM backbones.
Supports different training strategies
- Supports FSDP together with DDP, and supports LoRA training mode.
- Supports eval-after-train.
- Supports resuming training from checkpoints.
Data and weight formats
- Supports Parquet datasets and loading LeRobot-format data.
- Supports model weights in safetensors format.
Evaluation and inference capabilities
- Supports multi-GPU evaluating libero on devices without ray tracing.
- Supports remote inference infrastructure with ZMQ-based server/client architecture, enabling GPU-offloaded inference for resource-constrained edge devices. See Remote Inference Serving.
- Supports RTC (Real-Time Chunking) to improve cross-chunk trajectory continuity.
- Supports accelerated inference for GR00T and PI0.5; see Inference Acceleration, including Triton fused kernels, CUDA Graph capture, and CUDA custom operators.
Local debugging
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node [NUM_GPUS] scripts/train.py --config [CONFIG_PATH] --work-dir [WORK_DIR] --cfg-options train_dataloader.per_device_batch_size=[PER_DEVICE_BATCH_SIZE]
Example:
export WANDB_MODE=disabled
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node 2 scripts/train.py --config configs/pi05/pi05_paligemma_libero_10_full_finetune.py --work-dir ./checkpoints/pi05_paligemma_libero_10_full_finetune --cfg-options train_dataloader.per_device_batch_size=2
Local evaluation
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node [NUM_GPUS] scripts/eval.py --config [CONFIG_PATH] --ckpt-path [CKPT_PATH] --cfg-options [CFG_OPTIONS]
Example:
export WANDB_MODE=disabled
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node 2 scripts/eval.py --config configs/pi05/pi05_paligemma_libero_10_full_finetune.py --ckpt-path checkpoints/pi05_paligemma_libero_10_full_finetune_bs64/checkpoints/step-028548-epoch-18-loss=0.0111.safetensors
Cluster training
export WANDB_MODE=disabled
bash scripts/train.sh [CONFIG] [WORK_DIR] --cfg-options train_dataloader.per_device_batch_size=[PER_DEVICE_BATCH_SIZE] train_dataloader.batch_size=[GLOBAL_BATCH_SIZE] runner.max_steps=[MAX_STEPS] runner.save_interval=[SAVE_INTERVAL] runner.max_keep_ckpts=[MAX_KEEP_CKPTS] --eval-after-train
Resume training from a checkpoint
To resume training from a checkpoint, use the --resume-from argument to specify the checkpoint file path. Training will continue from the saved global step, epoch, model state, and optimizer state.
Local training example:
export WANDB_MODE=disabled
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node 2 scripts/train.py \
--config configs/pi05/pi05_paligemma_libero_10_full_finetune.py \
--work-dir ./work_dirs/pi05_paligemma_libero_10_full_finetune \
--resume-from ./work_dirs/pi05_paligemma_libero_10_full_finetune/checkpoints/checkpoint_epoch_5.pt \
--cfg-options train_dataloader.per_device_batch_size=2
Cluster training example:
export WANDB_MODE=disabled
bash scripts/train.sh [CONFIG] [WORK_DIR] \
--resume-from [CHECKPOINT_PATH] \
--cfg-options train_dataloader.per_device_batch_size=[PER_DEVICE_BATCH_SIZE] runner.max_steps=[MAX_STEPS]
Cluster evaluation
export WANDB_MODE=disabled
bash scripts/eval.sh [CONFIG] [CKPT_PATH] --cfg-options [CFG_OPTIONS]
Real-robot inference
When running inference on a real robot, first install the environment on the robot side, and then run:
python scripts/inference_real_robot.py --config [CONFIG] -- ckpt-path [CKPT_PATH]
Q: Problems connecting to Hugging Face when downloading models or datasets.
A: If you encounter Hugging Face connectivity issues (e.g., slow downloads, timeouts, or connection refused), set the following environment variable before running the command and use hf-mirror:
export HF_ENDPOINT="https://hf-mirror.com"Q: conda install av is very slow at resolving the environment.
A: You can use the libmamba solver to speed up dependency resolution:
conda install -c conda-forge av=14.4.0 --solver=libmambaQ: GR00T evaluation on LIBERO is unstable.
A: This is expected. GR00T's performance on LIBERO is sensitive to random seeds, the hardware environment, and the number of training epochs. Small changes in these factors may cause noticeable fluctuations in evaluation results. It is recommended to run experiments with multiple random seeds and select the best checkpoint based on evaluation performance.
Q: When running pip install -r requirements.txt, building egl_probe fails with RuntimeError: CMake must be installed.
A: egl_probe needs CMake to build. Install it via conda (recommended) or apt:
conda install -c conda-forge cmake
# or
sudo apt install cmakeNote: Do not use
pip install cmake. The pip package is a Python wrapper and may fail because pip isolates the build environment.
Q: egl_probe build fails and reports Compatibility with CMake < 3.5 has been removed from CMake.
A: This is usually because your CMake version is too new for the egl_probe CMakeLists.txt. Set the following environment variable before installing:
CMAKE_POLICY_VERSION_MINIMUM=3.5 pip install -r requirements.txtQ: After installation, I get NumPy version errors (e.g., RuntimeError: Numpy is not available or version incompatibility warnings).
A: During installation, some dependencies may overwrite the pinned NumPy version. Reinstall the correct version directly:
pip install numpy==1.26.4Q: Inference fails on RTX 5090 (e.g., Triton kernel errors or CUDA compatibility issues).
A: RTX 5090 (Blackwell architecture) requires an updated Triton version. Upgrade to Triton 3.2.0 or higher:
pip install triton==3.2.0Please see the contribution workflow and guidelines in docs/CONTRIBUTING.md.
Quick conventions:
- Discuss first: for new features/models or other large changes, please open a GitHub Issue to align on scope and design.
- Branch from upstream: create your branch from
upstream/mainand use prefixes likefeat/,fix/,docs/, etc. (details in the contributing guide). - Run checks before PR: make sure local pre-commit passes and CI is green.
- Commit messages: we recommend Conventional Commits (examples in the contributing guide).
If you encounter any issues while using this repository, feel free to contact us. You can reach us directly at mason@limxdynamics.com and wayne@limxdynamics.com, or open a GitHub issue for help.
If you use FluxVLA in your research or projects, please cite it as:
@software{FluxVLA2026,
author = {Li, Yinhao and Mao, Weixin and Lan, Zihan and Rong, Jikun and Zhu, Minzhao and Mao, Yiming and Shen, Bowen and Huang, Xu},
title = {{FluxVLA Engine: A One-Stop VLA Engineering Platform for Embodied Intelligence}},
year = {2026},
month = apr,
version = {1.0.0},
doi = {10.5281/zenodo.20049506},
url = {https://github.com/FluxVLA/FluxVLA},
license = {Apache-2.0},
}Acknowledgements: This project benefits from the following open-source projects and community efforts. Thanks to: LeRobot, NVIDIA Isaac GR00T, OpenVLA, OpenPI (pi0), LLaVA, DeepSpeed, Qwen, Triton, RTC, Training RTC, and Realtime-VLA. If we missed your project or contribution, please open an issue or pull request so we can properly acknowledge it.
- Support more vision backbone networks.
- Support more VLM backbones.
- Support more VLA methods.
- Support training with VLM data or reasoning-chain-of-thought (CoT) data.
- RLDS datasets will be deprecated and replaced by Parquet datasets.
- Full implementation of the logger feature.
- Support Isaac Sim.
- Support SARM.