Create an inference workflow for modalities other than text.

Text prediction is pretty simple and there's some demos of it in some notebooks somewhere in this repo. VQA is a little bit more complex because you need to build the sequence from scratch. The existing code is nice for loading samples from a dataset. But there's nothing that helps with sequencing for inference.

It's especially needed for robotics control prediction. Each environment will have its own unique API and it would be nice to play with a few and figure out what we want to use as a common integration point. For Minigrid/Gymnasium, I guess it would be something like a custom sequencer that takes the previous Timesteps and the predicted action from the model, passes that action to the environment's `step`, then tokenizes and sequences the resulting observation, concatenates it with the previous timesteps, and truncates any old timesteps that don't fit in the context window. It sounds like a lot when I describe it. What's that going to look like from a user's perspective? You define, like, a FourRoomsInferencer, that subclasses some base Inferencer, that takes the model and a couple of custom functions like an initialization thing and a custom sequencer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create an inference workflow for modalities other than text. #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Create an inference workflow for modalities other than text. #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions