Skip to content

BASHLab/RAVEN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


🚀 Main Results

Comparison of RAVEN and prior MLLMs on exocentric open-ended video QA (MSVD-QA, MSRVTT-QA, ActivityNet-QA) and audio-visual QA (AVSD, MUSIC-QA) benchmarks. Best and second-best scores are in $\textbf{Bold}$ and $\underline{\text{underline}}$. $^*$ indicates scores reproduced by us.

Comparison of RAVEN with MLLMs on the EgoThink (Reasoning) and AVS-QA benchmarks. RAVEN outperforms across metrics and excels in reasoning. $\textbf{Bold}$ and $\underline{\text{underline}}$ indicate the best and second-best scores.

🛠️ Requirements and Installation

Basic Dependencies:

  • Python >= 3.8
  • Pytorch >= 2.2.0
  • CUDA Version >= 11.8
  • transformers == 4.40.0 (for reproducing paper results)
  • tokenizers == 0.19.1
cd RAVEN
pip install -r requirements.txt
pip install flash-attn==2.5.8 --no-build-isolation
pip install opencv-python==4.5.5.64
apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

📁 AVS-QA Dataset

Train and test split of AVS-QA is provided here.
More details here.

🗝️ Training & Evaluation

Coming Soon!

🍀 Model Zoo

Model Name Modal Type
RAVEN-7B-AV AV
RAVEN-7B-AVS AVS

🤖 Inference

CUDA_VISIBLE_DEVICES=0 python inference.py --model-path=<MODEL PATH> --modal-type=<MODAL TYPE>

👍 Acknowledgement

The codebase of RAVEN is adapted from VideoLLaMA2. We are also grateful for their contribution.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages