Skip to content

Create an inference workflow for modalities other than text. #4

@eihli

Description

@eihli

Text prediction is pretty simple and there's some demos of it in some notebooks somewhere in this repo. VQA is a little bit more complex because you need to build the sequence from scratch. The existing code is nice for loading samples from a dataset. But there's nothing that helps with sequencing for inference.

It's especially needed for robotics control prediction. Each environment will have its own unique API and it would be nice to play with a few and figure out what we want to use as a common integration point. For Minigrid/Gymnasium, I guess it would be something like a custom sequencer that takes the previous Timesteps and the predicted action from the model, passes that action to the environment's step, then tokenizes and sequences the resulting observation, concatenates it with the previous timesteps, and truncates any old timesteps that don't fit in the context window. It sounds like a lot when I describe it. What's that going to look like from a user's perspective? You define, like, a FourRoomsInferencer, that subclasses some base Inferencer, that takes the model and a couple of custom functions like an initialization thing and a custom sequencer?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions