GitHub - yhx30/VideoQA

Video-QA

Implementing video open-ended question answering tasks on the Next-GQA dataset based on the LLaVa-1.6 and GPT-4o mini models, utilizing a sliding window sampling method.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

🔥 About The Project

(back to top)

🧐 Requirement

Install the environment:

Operating System: 
Conda Version:
Python Version: 
CUDA Version:

Main site-packages:

tqdm
moviepy
opencv-python
openai==1.14.0
torch==2.2.0
bitsandbytes==0.42.0
flash_attn==2.5.3
transformers==4.36.2
transformers-stream-generator==0.0.4
torchvision==0.17.0
pytorchvideo @ git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d

Run the following code to install the required packages:

pip install requirements.txt

Configure the object tracking module:

Copy the files from the SAMTrack directory to your site-packages path to enable the target tracking functionality.

(back to top)

🤗 Datasets

We use a large-scale video-question-answer dataset, which you can access and download from here.

(back to top)

🎯 Usage

Run the following code to test the experimental results without sliding window sampling (using uniform sampling across the entire video):

python eval_gpt4v_openended.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>

Run the following code to test the experimental results without video input:

python eval_gpt4v_openended_novideo.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_novideo/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>

Run the following code to test the experimental results without evidence segments (i.e., segments containing ground-truth have been removed from the video):

python eval_gpt4v_openended_woevidence_separate.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_woevidence/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>

Run the following code to test the experimental results of Ground (extracting 6 frames, separate):

python eval_gpt4v_openended_separate_ground.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate_ground/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>

Run the following code to test the experimental results of selecting answers using perplexity under the sliding window method (15 stride size / 30 window size, extracting 6 frames, separate):

python eval_gpt4v_openended_sliding_separate.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>

Run the following code to test the experimental results with the addition of Object Segment & Track(SAMTrack) under ground truth conditions:

python eval_gpt4v_openended_separate_ground_track.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate_ground_samtrack/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>

Run the following code to test the experimental results of selecting answers using confidence (with a maximum score of 1000) under the sliding window method (15 stride size / 30 window size, extracting 6 frames, separate) (Current best performance - QA-Acc: 39.80 IOP: 27.12 GQA: 13.2):

python eval_gpt4v_openended_sliding_separate_confidence.py --path_qa_pair_csv ./data/open_ended_qa/Next_GQA.csv --path_video ./data/NextGQAvideo/%s.mp4 --path_result ./result_NextGQA_gpt4_separate_confidence/ --api_key4 <your gpt4o-mini api key> --api_key3 <your gpt3 api key>

(back to top)

🚨 Results

To be added ...

🤓 Contributing

Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Top contributors:

(back to top)

😋 License

Distributed under the Unlicense License. See LICENSE.txt for more information.

(back to top)

📝 Cite

To be added ...

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
evaluation		evaluation
example		example
images		images
llava		llava
model_processor		model_processor
pipeline_processor		pipeline_processor
samtrack		samtrack
vision_processor		vision_processor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_gpt4v_multiplechoice.py		eval_gpt4v_multiplechoice.py
eval_gpt4v_openended.py		eval_gpt4v_openended.py
eval_gpt4v_openended_novideo.py		eval_gpt4v_openended_novideo.py
eval_gpt4v_openended_separate_ground.py		eval_gpt4v_openended_separate_ground.py
eval_gpt4v_openended_separate_ground_mask.py		eval_gpt4v_openended_separate_ground_mask.py
eval_gpt4v_openended_separate_ground_track.py		eval_gpt4v_openended_separate_ground_track.py
eval_gpt4v_openended_sliding.py		eval_gpt4v_openended_sliding.py
eval_gpt4v_openended_sliding_bound.py		eval_gpt4v_openended_sliding_bound.py
eval_gpt4v_openended_sliding_ground.py		eval_gpt4v_openended_sliding_ground.py
eval_gpt4v_openended_sliding_separate.py		eval_gpt4v_openended_sliding_separate.py
eval_gpt4v_openended_sliding_separate_bound_confidence.py		eval_gpt4v_openended_sliding_separate_bound_confidence.py
eval_gpt4v_openended_sliding_separate_confidence.py		eval_gpt4v_openended_sliding_separate_confidence.py
eval_gpt4v_openended_woevidence.py		eval_gpt4v_openended_woevidence.py
eval_gpt4v_openended_woevidence_separate.py		eval_gpt4v_openended_woevidence_separate.py
eval_gpt4v_textgeneration_openended.py		eval_gpt4v_textgeneration_openended.py
eval_llava_multiplechoice.py		eval_llava_multiplechoice.py
eval_llava_openended.py		eval_llava_openended.py
eval_llava_textgeneration_openended.py		eval_llava_textgeneration_openended.py
eval_sliding.py		eval_sliding.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video-QA

🔥 About The Project

🧐 Requirement

🤗 Datasets

🎯 Usage

🚨 Results

🤓 Contributing

Top contributors:

😋 License

📝 Cite

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Video-QA

🔥 About The Project

🧐 Requirement

🤗 Datasets

🎯 Usage

🚨 Results

🤓 Contributing

Top contributors:

😋 License

📝 Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages