Qs on the robotwin.

Hi, vidar team, thanks for releasing this great work!

I have a question about the RoboTwin fine-tuning setting. Since different tasks in the dataset have different episode lengths, I wonder how Vidar handles this during SFT. Does Vidar directly resample/compress all demonstrations into a fixed-length video, e.g., around 61 frames, and then train the video diffusion model with this fixed horizon?

In my experiments, I found that with 61 generated frames, some tasks can be completed, while other longer-horizon tasks can only be partially completed. This makes me a bit confused about how to align the generated video with demonstrations of different lengths.

Could you please clarify how variable task durations are handled during training and evaluation? 

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qs on the robotwin. #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Qs on the robotwin. #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions