Skip to content

FluxVLA/FluxVLA

Repository files navigation

FluxVLA Engine: A One-Stop VLA Engineering Platform for Embodied Intelligence

FluxVLA

Hugging Face

English | 简体中文 | 日本語

FluxVLA Engine is a full-stack, end-to-end engineering platform for deploying embodied intelligence applications. Built on the core design principles of unified configuration, standardized interfaces, module decoupling, and deployability, it creates a complete engineering loop from data to real-device deployment. With the goal of providing a standardized industry–academia–research foundation, it significantly lowers the engineering barrier for VLA research and development.

Framework

Framework Architecture

Performance

Codebase Libero-Spatial Libero-Object Libero-Goal Libero-Long Libero-Average
FluxVLA(GR00T) 96.2 96.8 93.4 89.4±1.5 93.95
FluxVLA(Pi) 98.6 99.0 97.8 96±1.0 97.85
FluxVLA(Qwen3VL 0.6B+GR00T) 98.6 99.6 95.6 92.2±1.8 96.50
FluxVLA(DreamZero) 96.8 97.4 90.8±1.5 93.6 94.65
FluxVLA(SmolVLA) 87.4 93.2 92.0 63.4 84.0

📢 Latest News

[2026/04/22] 🔥 ZMQ-based remote inference framework is now supported.

[2026/04/15] 🔥 DreamZero WAM is now supported.

[2026/04/08] 🔥 FluxVLA has been open-sourced.

🛠️ Installation

The installation guide below uses NVCC 12.4 as an example. If your environment differs, adjust the CUDA version accordingly.

1. Create a conda environment
conda create -n fluxvla python=3.10 -y
conda activate fluxvla
2. Install PyTorch (CUDA version)

Important: Before running pip install -r requirements.txt, you must install PyTorch from the official CUDA index first. The default PyPI index cannot fetch CUDA-enabled builds.

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

For other CUDA versions, replace cu124 with the corresponding value (e.g., cu118, cu121). See: https://pytorch.org/get-started/locally/ .

3. Install flash-attention

Method 1: Install directly via pip:

pip install psutil ninja packaging
# MAX_JOBS controls the number of parallel build threads; tune it based on your machine resources
MAX_JOBS=8 pip install flash-attn==2.5.5 --no-build-isolation --find-links https://github.com/Dao-AILab/flash-attention/releases

Method 2: Build from source (recommended if method 1 fails):

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.5.5
# MAX_JOBS controls the number of parallel build threads; tune it based on your machine resources
MAX_JOBS=8 python setup.py install
4. Install av
conda install -c conda-forge av=14.4.0
5. Install fluxvla and other dependencies
pip install -r requirements.txt
pip install --no-build-isolation -e .

Note: requirements.txt pins torch==2.6.0 to prevent pip from accidentally replacing the CUDA-enabled PyTorch installed in step 2. If you need to use another torch version, update both the step-2 command and the torch version in requirements.txt.

Online evaluation environment (LIBERO / EGL)

If you want to evaluate LIBERO on devices that do not support ray tracing (e.g., A100), please refer to EGL Device GPU Rendering Configuration.

Install system dependencies

export MUJOCO_GL=egl
sudo apt install libegl-dev libgl1-mesa-dev libx11-dev libglew-dev libosmesa6-dev

Environment checks

Make sure /proc/1/environ contains the following environment variables:

  • NVIDIA_DRIVER_CAPABILITIES=all
  • NVARCH=x86_64
  • NVIDIA_REQUIRE_CUDA=cuda>=12.4
  • brand=tesla and driver>=470

Create an EGL configuration file

Create file /usr/share/glvnd/egl_vendor.d/10_nvidia.json with the following content:

{
    "file_format_version": "1.0.0",
    "ICD": {
        "library_path": "libEGL_nvidia.so.0"
    }
}
Configure pre-commit hooks (optional but recommended)

To ensure code quality and consistency (especially for C++/CUDA code), install pre-commit hooks:

pip install pre-commit
pre-commit install

This will automatically check and format code before every commit.

Configure Weights & Biases (wandb)

Weights & Biases is used for experiment tracking and visualization. Configure it as follows:

  1. Install wandb (included in requirements.txt):
pip install wandb
  1. Log in to your wandb account:
wandb login
  1. Set environment variables:
export WANDB_PROJECT=fluxvla        # project name (default: fluxvla)
export WANDB_ENTITY=your-team-name  # team name or username (default: None)
export WANDB_MODE=online            # online, offline, or disabled (default: online)
  1. If you want to disable wandb logging during training, set:
export WANDB_MODE=disabled

Note: all wandb configuration is read from environment variables; no additional settings are needed in config files.

Configure TensorBoard (optional)

TensorBoard is supported as an optional logging backend for experiment metric visualization. Configure it as follows:

  1. Add 'tensorboard' to active_trackers in your config file:
metric=dict(
    type='VLAMetric',
    active_trackers=('jsonl', 'wandb', 'tensorboard'),
    ...
)

Alternatively, enable it via command line without modifying the config file:

--cfg-options 'runner.metric.active_trackers=[jsonl,wandb,tensorboard]'
  1. After training, launch TensorBoard to view metrics:
tensorboard --logdir work_dirs/tensorboard

Note: event files are saved to {work_dir}/tensorboard/{run_id}/ per run, enabling automatic comparison across experiments. If the TENSORBOARD_LOG_PATH environment variable is set, it will be used directly as the log directory.

Data Preparation

Use the datasets we prepared directly

Download the required datasets and place them under ./datasets. Download only the datasets you need according to your configuration.

Dataset Download link
libero-object limxdynamics/FluxVLAData/libero_object_no_noops_lerobotv2.1
libero-spatial limxdynamics/FluxVLAData/libero_spatial_no_noops_lerobotv2.1
libero-10 limxdynamics/FluxVLAData/libero_10_no_noops_lerobotv2.1
libero-goal limxdynamics/FluxVLAData/libero_goal_no_noops_lerobotv2.1
modified_libero_rlds openvla/modified_libero_rlds
RealRobot_AgileX_aloha limxdynamics/FluxVLAData/RealRobot_AgileX_aloha_lerobot_v2
RealRobot_UR3_Chem limxdynamics/FluxVLAData/RealRobot_UR3_Chem_lerobot_v2

For example, download the libero-10 dataset:

huggingface-cli download limxdynamics/FluxVLAData --repo-type dataset --include "libero_10_no_noops_lerobotv2.1/*" --local-dir ./datasets

Replace libero_10_no_noops_lerobotv2.1 with the corresponding folder name of the dataset you want to download.

Private dataset directory structure

If you train with fluxvla on private datasets, you need to convert your raw data (e.g., HDF5 files collected by ALOHA robots) into the LeRobot Dataset v2.1 format. For a step-by-step conversion guide, see Data Conversion Guide.

The converted dataset should follow this directory structure:

├── data
│   └── chunk000
│   │   └── episode_000000.parquet
│   │   └── episode_000001.parquet
│   │   └── ... (more parquet files)
│   │   └── episode_00000N.parquet
│   └── chunk001
│   └── ... (more chunks)
│   └── chunk00N
├── meta
│   └── episodes.jsonl
│   └── episodes_stats.jsonl
│   └── info.json
│   └── tasks.jsonl
├── videos
│   └── chunk000
│   │   └── camera name 0
│   │   │   └── episode_000000.mp4
│   │   │   └── episode_000001.mp4
│   │   │   └── ...(more mp4 files)
│   │   │   └── episode_00000N.mp4
│   │   └── camera name 1
│   └── chunk001
│   └── ... (more chunks)
│   └── chunk00N

🤗 Checkpoint Preparation

Download the required pretrained checkpoints and place them under ./checkpoints. Download only the checkpoints you need based on your configuration.

VLA models
Model Size Download link
GR00T N1.5 3B 🤗 Hugging Face
OpenVLA 7B 🤗 Hugging Face
PI0_base 3B 🤗 Hugging Face
PI05_base 3B 🤗 Hugging Face
PI05_libero 3B 🤗 Hugging Face
SmolVLA 450M 🤗 Hugging Face
Vision-Language Models (VLM)
Model Size Download link
Qwen2.5-VL 3B 🤗 Hugging Face
SmolVLM2 500M 🤗 Hugging Face
Large Language Models (LLM)
Model Size Download link
Qwen 2.5 3B 🤗 Hugging Face
Qwen 2.5 7B 🤗 Hugging Face
Llama 2 7B 🤗 Hugging Face
Vision backbone networks
Model Download link
ViT-Large (DINOv2) 🤗 Hugging Face
ViT-SO400M (SigLIP) 🤗 Hugging Face
SigLIP2 🤗 Hugging Face
paligemma 🤗 Hugging Face

Tip: You can speed up downloads with huggingface-cli download <model-name> --local-dir ./checkpoints/<model-name>.

Trained models

You can also download models that have been trained with FluxVLA for inference or evaluation directly. Place them under ./work_dirs.

Model Download link
PI0.5 PaliGemma Libero-10 🤗 Hugging Face
GR00T Eagle 3B Libero-10 🤗 Hugging Face
# Example: download the PI0.5 checkpoint from limxdynamics/FluxVLAEngine
huggingface-cli download limxdynamics/FluxVLAEngine --include "pi05_paligemma_libero_10_full_finetune_bs64/*" --local-dir ./checkpoints/pi05_paligemma_libero_10_full_finetune_bs64

🌟 Features

All-in-one: One configuration file manages the full workflow
  • Manage key parameters for data, models, training, evaluation, inference, and deployment through a single config file (easier to reproduce and deploy).
Supports different VLA models
  • Supports OpenVLA, LlavaVLA, Gr00t, Pi0, and Pi0.5.
Supports different modules
  • Supports Llama, Gemma, and Qwen-family LLM backbones.
  • Supports DINOv2 and SigLIP vision backbones.
  • Supports PaliGemma and Qwen-VL VLM backbones.
Supports different training strategies
  • Supports FSDP together with DDP, and supports LoRA training mode.
  • Supports eval-after-train.
  • Supports resuming training from checkpoints.
Data and weight formats
  • Supports Parquet datasets and loading LeRobot-format data.
  • Supports model weights in safetensors format.
Evaluation and inference capabilities
  • Supports multi-GPU evaluating libero on devices without ray tracing.
  • Supports remote inference infrastructure with ZMQ-based server/client architecture, enabling GPU-offloaded inference for resource-constrained edge devices. See Remote Inference Serving.
  • Supports RTC (Real-Time Chunking) to improve cross-chunk trajectory continuity.
  • Supports accelerated inference for GR00T and PI0.5; see Inference Acceleration, including Triton fused kernels, CUDA Graph capture, and CUDA custom operators.

VLA Speedup

Usage

Local debugging
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node [NUM_GPUS] scripts/train.py --config [CONFIG_PATH] --work-dir [WORK_DIR] --cfg-options train_dataloader.per_device_batch_size=[PER_DEVICE_BATCH_SIZE]

Example:

export WANDB_MODE=disabled
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node 2 scripts/train.py --config configs/pi05/pi05_paligemma_libero_10_full_finetune.py --work-dir ./checkpoints/pi05_paligemma_libero_10_full_finetune --cfg-options train_dataloader.per_device_batch_size=2
Local evaluation
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node [NUM_GPUS] scripts/eval.py --config [CONFIG_PATH] --ckpt-path [CKPT_PATH] --cfg-options [CFG_OPTIONS]

Example:

export WANDB_MODE=disabled
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node 2 scripts/eval.py --config configs/pi05/pi05_paligemma_libero_10_full_finetune.py --ckpt-path checkpoints/pi05_paligemma_libero_10_full_finetune_bs64/checkpoints/step-028548-epoch-18-loss=0.0111.safetensors
Cluster training
export WANDB_MODE=disabled
bash scripts/train.sh [CONFIG] [WORK_DIR] --cfg-options train_dataloader.per_device_batch_size=[PER_DEVICE_BATCH_SIZE] train_dataloader.batch_size=[GLOBAL_BATCH_SIZE] runner.max_steps=[MAX_STEPS] runner.save_interval=[SAVE_INTERVAL] runner.max_keep_ckpts=[MAX_KEEP_CKPTS] --eval-after-train
Resume training from a checkpoint

To resume training from a checkpoint, use the --resume-from argument to specify the checkpoint file path. Training will continue from the saved global step, epoch, model state, and optimizer state.

Local training example:

export WANDB_MODE=disabled
/root/miniconda3/envs/fluxvla/bin/torchrun --standalone --nnodes 1 --nproc-per-node 2 scripts/train.py \
  --config configs/pi05/pi05_paligemma_libero_10_full_finetune.py \
  --work-dir ./work_dirs/pi05_paligemma_libero_10_full_finetune \
  --resume-from ./work_dirs/pi05_paligemma_libero_10_full_finetune/checkpoints/checkpoint_epoch_5.pt \
  --cfg-options train_dataloader.per_device_batch_size=2

Cluster training example:

export WANDB_MODE=disabled
bash scripts/train.sh [CONFIG] [WORK_DIR] \
  --resume-from [CHECKPOINT_PATH] \
  --cfg-options train_dataloader.per_device_batch_size=[PER_DEVICE_BATCH_SIZE] runner.max_steps=[MAX_STEPS]
Cluster evaluation
export WANDB_MODE=disabled
bash scripts/eval.sh [CONFIG] [CKPT_PATH] --cfg-options [CFG_OPTIONS]
Real-robot inference

When running inference on a real robot, first install the environment on the robot side, and then run:

python scripts/inference_real_robot.py --config [CONFIG] -- ckpt-path [CKPT_PATH]

FAQ

Q: Problems connecting to Hugging Face when downloading models or datasets.

A: If you encounter Hugging Face connectivity issues (e.g., slow downloads, timeouts, or connection refused), set the following environment variable before running the command and use hf-mirror:

export HF_ENDPOINT="https://hf-mirror.com"
Q: conda install av is very slow at resolving the environment.

A: You can use the libmamba solver to speed up dependency resolution:

conda install -c conda-forge av=14.4.0 --solver=libmamba
Q: GR00T evaluation on LIBERO is unstable.

A: This is expected. GR00T's performance on LIBERO is sensitive to random seeds, the hardware environment, and the number of training epochs. Small changes in these factors may cause noticeable fluctuations in evaluation results. It is recommended to run experiments with multiple random seeds and select the best checkpoint based on evaluation performance.

Q: When running pip install -r requirements.txt, building egl_probe fails with RuntimeError: CMake must be installed.

A: egl_probe needs CMake to build. Install it via conda (recommended) or apt:

conda install -c conda-forge cmake
# or
sudo apt install cmake

Note: Do not use pip install cmake. The pip package is a Python wrapper and may fail because pip isolates the build environment.

Q: egl_probe build fails and reports Compatibility with CMake < 3.5 has been removed from CMake.

A: This is usually because your CMake version is too new for the egl_probe CMakeLists.txt. Set the following environment variable before installing:

CMAKE_POLICY_VERSION_MINIMUM=3.5 pip install -r requirements.txt
Q: After installation, I get NumPy version errors (e.g., RuntimeError: Numpy is not available or version incompatibility warnings).

A: During installation, some dependencies may overwrite the pinned NumPy version. Reinstall the correct version directly:

pip install numpy==1.26.4
Q: Inference fails on RTX 5090 (e.g., Triton kernel errors or CUDA compatibility issues).

A: RTX 5090 (Blackwell architecture) requires an updated Triton version. Upgrade to Triton 3.2.0 or higher:

pip install triton==3.2.0

Contributing

Please see the contribution workflow and guidelines in docs/CONTRIBUTING.md.

Quick conventions:

  • Discuss first: for new features/models or other large changes, please open a GitHub Issue to align on scope and design.
  • Branch from upstream: create your branch from upstream/main and use prefixes like feat/, fix/, docs/, etc. (details in the contributing guide).
  • Run checks before PR: make sure local pre-commit passes and CI is green.
  • Commit messages: we recommend Conventional Commits (examples in the contributing guide).

Support

If you encounter any issues while using this repository, feel free to contact us. You can reach us directly at mason@limxdynamics.com and wayne@limxdynamics.com, or open a GitHub issue for help.

🙏 Citation & Acknowledgements

If you use FluxVLA in your research or projects, please cite it as:

@software{FluxVLA2026,
  author  = {Li, Yinhao and Mao, Weixin and Lan, Zihan and Rong, Jikun and Zhu, Minzhao and Mao, Yiming and Shen, Bowen and Huang, Xu},
  title   = {{FluxVLA Engine: A One-Stop VLA Engineering Platform for Embodied Intelligence}},
  year    = {2026},
  month   = apr,
  version = {1.0.0},
  doi     = {10.5281/zenodo.20049506},
  url     = {https://github.com/FluxVLA/FluxVLA},
  license = {Apache-2.0},
}

Acknowledgements: This project benefits from the following open-source projects and community efforts. Thanks to: LeRobot, NVIDIA Isaac GR00T, OpenVLA, OpenPI (pi0), LLaVA, DeepSpeed, Qwen, Triton, RTC, Training RTC, and Realtime-VLA. If we missed your project or contribution, please open an issue or pull request so we can properly acknowledge it.

Roadmap

  • Support more vision backbone networks.
  • Support more VLM backbones.
  • Support more VLA methods.
  • Support training with VLM data or reasoning-chain-of-thought (CoT) data.
  • RLDS datasets will be deprecated and replaced by Parquet datasets.
  • Full implementation of the logger feature.
  • Support Isaac Sim.
  • Support SARM.

About

An all-in-one VLA engineering platform for embodied AI — from data to real-robot deployment.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages