Skip to content
/ DualOT Public

This is the official PyTorch implementation of the DualOT algorithm from the paper "Policy Learning from Few Expert Videos via Long-Short Term Optimal Transport Reward Coordination".

License

Notifications You must be signed in to change notification settings

RLShi/DualOT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Policy Learning from Few Expert Videos via Long-Short Term Optimal Transport Reward Coordination

This is the official PyTorch implementation of the DualOT algorithm from the paper "Policy Learning from Few Expert Videos via Long-Short Term Optimal Transport Reward Coordination".

Environment Setup

Build the experimental environment using Docker:

Before building the Docker image, it is necessary to ensure that you are in the Dockerfile directory.

docker build -t dualot:v1 .

After creating a Docker container using this image, enter the Docker environment and activate the Python environment using conda activate dualot. Then, you can run the algorithm.

Build the experimental environment using Conda:

  1. Follow this link to install the Mujoco.

  2. Install the following libraries:

sudo apt-get update
sudo apt-get install libosmesa6-dev libgl1-mesa-glx libglfw3
  1. Install Other dependencies:
conda create -y -n dualot python=3.9.19
conda install -y pytorch=2.2.2 torchvision=0.17.2 torchaudio=2.2.2 -c pytorch -c nvidia
pip install -r Dockerfile/downloads/requirements.txt
pip install dm_control PyOpenGL-accelerate

Prepare Expert Video Dataset

  1. For the Meta-World Benchmark, you can either download the expert demonstration dataset directly from this link, or generate the new expert demonstration dataset using the metaworld_generate_expert/generate_demo.py. The dataset is placed in the folder named IL/expert_demos/metaworld/${metaworld_task_name}.

  2. For the Deepmind Control Suite (DMC) Benchmark, we use the DrQv2 algorithm to train the agent on the corresponding task, and then use the trained agent to collect 10 pieces of expert demonstration data. The dataset is placed in the folder named IL/expert_demos/dmc/${dmc_task_name}.

Training Agent

For the Metaword Benchmark, we use the following similar instructions to run the code:

# Make sure the command line is located under the "IL" folder.
python train.py \
    root_dir=DualOT/IL \
    seed=2 \
    suite=metaworld \
    suite/metaworld_task=basketball \
    obs_type=pixels \
    agent=dualot

For the DMC Benchmark, we use the following similar instructions to run the code:

# Make sure the command line is located under the "IL" folder.
python train.py \
    root_dir=DualOT/IL \
    seed=2 \
    suite=dmc \
    suite/dmc_task=quadruped_run \
    suite.num_train_frames=500000 \
    obs_type=pixels \
    agent=dualot

Acknowlegments

About

This is the official PyTorch implementation of the DualOT algorithm from the paper "Policy Learning from Few Expert Videos via Long-Short Term Optimal Transport Reward Coordination".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published