This Visual Question Answering (VQA) project offers two approaches to train and utilize VQA models using two different architectures: Bi-LSTM and ResNet-50, as well as using pretrained ViT and RoBERTa models.
- Open the
lstm_cnn.ipynb
file. - Run the cells in the notebook to train the VQA model using Bi-LSTM and ResNet-50 on your data.
- Open the
vqa_transformer.ipynb
file. - Run the cells in the notebook to train the VQA model using ViT and RoBERTa on your data.
Using pretrained models (ViT and RoBERTa) can often lead to improved accuracy of the main model due to their ability to capture complex patterns and semantic information in the data.