RL-VLM-F

This is the official codebase for:
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback,
Yufei Wang*, Zhanyi Sun*, Jesse Zhang, Zhou Xian, Erdem Bıyık, David Held†, Zackory Erickson†,
ICML 2024.
Website | ArXiv

Install

Install the conda env via

conda env create -f conda_env.yml
conda activate rlvlmf

PLEASE ONLY USE THE METAWORLD FOR RIGHT NOW. There is no need for docker right now.

Train a Metaworld scene

First download the cached data below. Then, go to run.sh and select which metaworld env to run

To run the metaworld env, we only need to run on the host (no docker needed). To do this make sure you have your conda rlvlmf env activated. You can run . ./activate_conda.sh after modifications to it (link your miniforge properly!). This will also source prepare.sh so that things are linked properly. Then run . ./run.sh

Run experiments

Get a Gemini api key: follow instructions at https://aistudio.google.com/app/apikey
We use GPT4v for the cloth fold task, get the OpenAI API key.
Make sure you're in the rlvlmf virtualenv and run
- conda env config vars set GEMINI_API_KEY="<ENTER API KEY>"
- conda env config vars set OPENAI_API_KEY="<ENTER API KEY>"
Reactivate virtualenv: conda deactivate && conda activate rlvlmf
Run source prepare.sh to prepare some environment variables.
Then please see run.sh for running experiments with different environments.

Cached VLM preference labels

Due to that Gemini-pro 1.0 has greatly decreased its free quota to be only 1500 request per day: https://ai.google.dev/pricing, we provide some of the VLM preference labels we cached when running the experiments. We only stored them at an interval during training, e.g., we stored every 25th time when we queried the VLM. Therefore, the total number of cached preferece labels are fewer than the number for the complete run. The labels are also not on-policy, which means they are not generated using the agent's online experience.
Still, we find that we are able to get roughly similar performances by using the cached preference labels, for Fold Cloth, Open Drawer, Soccer, CartPole, Straighten Rope, and Pass Water. The performance of Sweep Into with the cached labels is worse compared to the original results in the paper.
The cahced preference labels can be downloaded through this google drive link.
After downloading, put it under data so it looks like data/cached_labels/env_name/different_seed.
The commands in run.sh will by default load the cached preference labels; you can use cached_label_path=None to not use the cached labels and query the VLM online during training.
If you wish to fully reproduce the results in the paper, please train without using the provided cached labels, and generate the VLM preference labels online using the learning agent's online experience.

Adding new tasks

If you want to test RL-VLM-F on a new task, you should add the environment build function in utils.py, see make_metaworld_env for an example. If you want to run on more metaworld tasks, you should adjust the camera angle such that it focuses on the target object to manipulate. See metaworld/envs/assets_v2/objects/assets/xyz_base_transparant.xml for the camera parameters we used for the tasks in the paper.

Acknowledgements

We thank the author of PEBBLE for open sourcing their code, which our code is built on: https://github.com/pokaxpoka/B_Pref

Citation

If you find this codebase / paper useful in your research, please consider citing:

@InProceedings{wang2024,
  title = 	 {RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback},
  author =       {Wang, Yufei and Sun, Zhanyi and Zhang, Jesse and Xian, Zhou and Biyik, Erdem and Held, David and Erickson, Zackory},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  year = 	 {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
agent		agent
config		config
envs		envs
imgs		imgs
metaworld		metaworld
rlkit/envs		rlkit/envs
softgym		softgym
vlms		vlms
.gitignore		.gitignore
README.md		README.md
activate_conda.sh		activate_conda.sh
conda_env.yml		conda_env.yml
conv_net.py		conv_net.py
logger.py		logger.py
plot.py		plot.py
plot_multi.py		plot_multi.py
prepare.sh		prepare.sh
prompt.py		prompt.py
replay_buffer.py		replay_buffer.py
reward_model.py		reward_model.py
reward_model_score.py		reward_model_score.py
rlvlmf.csv		rlvlmf.csv
run.sh		run.sh
train_PEBBLE.py		train_PEBBLE.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL-VLM-F

Install

Train a Metaworld scene

Run experiments

Cached VLM preference labels

Adding new tasks

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

eecs-498-group-11/RL-VLM-F

Folders and files

Latest commit

History

Repository files navigation

RL-VLM-F

Install

Train a Metaworld scene

Run experiments

Cached VLM preference labels

Adding new tasks

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages