Visual Question Answering (VQA)

Introduction

This Visual Question Answering (VQA) project offers two approaches to train and utilize VQA models using two different architectures: Bi-LSTM and ResNet-50, as well as using pretrained ViT and RoBERTa models.

Usage

Training the model using Bi-LSTM and ResNet-50

Open the lstm_cnn.ipynb file.
Run the cells in the notebook to train the VQA model using Bi-LSTM and ResNet-50 on your data.

Training the model using ViT and RoBERTa

Open the vqa_transformer.ipynb file.
Run the cells in the notebook to train the VQA model using ViT and RoBERTa on your data.

Additional Comments

Using pretrained models (ViT and RoBERTa) can often lead to improved accuracy of the main model due to their ability to capture complex patterns and semantic information in the data.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
.gitignore		.gitignore
README.md		README.md
vision-question-answering-lstm-cnn.ipynb		vision-question-answering-lstm-cnn.ipynb
vqa-transformer.ipynb		vqa-transformer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Question Answering (VQA)

Introduction

Usage

Training the model using Bi-LSTM and ResNet-50

Training the model using ViT and RoBERTa

Additional Comments

About

Releases

Packages

Languages

Dodero10/Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

Visual Question Answering (VQA)

Introduction

Usage

Training the model using Bi-LSTM and ResNet-50

Training the model using ViT and RoBERTa

Additional Comments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages