Skip to content

Qs on the robotwin. #2

@chenkang455

Description

@chenkang455

Hi, vidar team, thanks for releasing this great work!

I have a question about the RoboTwin fine-tuning setting. Since different tasks in the dataset have different episode lengths, I wonder how Vidar handles this during SFT. Does Vidar directly resample/compress all demonstrations into a fixed-length video, e.g., around 61 frames, and then train the video diffusion model with this fixed horizon?

In my experiments, I found that with 61 generated frames, some tasks can be completed, while other longer-horizon tasks can only be partially completed. This makes me a bit confused about how to align the generated video with demonstrations of different lengths.

Could you please clarify how variable task durations are handled during training and evaluation?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions