Skip to content

Dodero10/Visual-Question-Answering

Repository files navigation

Visual Question Answering (VQA)

Introduction

This Visual Question Answering (VQA) project offers two approaches to train and utilize VQA models using two different architectures: Bi-LSTM and ResNet-50, as well as using pretrained ViT and RoBERTa models.

Usage

Training the model using Bi-LSTM and ResNet-50

  1. Open the lstm_cnn.ipynb file.
  2. Run the cells in the notebook to train the VQA model using Bi-LSTM and ResNet-50 on your data.

Training the model using ViT and RoBERTa

  1. Open the vqa_transformer.ipynb file.
  2. Run the cells in the notebook to train the VQA model using ViT and RoBERTa on your data.

Additional Comments

Using pretrained models (ViT and RoBERTa) can often lead to improved accuracy of the main model due to their ability to capture complex patterns and semantic information in the data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published