Skip to content
View hudaAlamri's full-sized avatar

Block or report hudaAlamri

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,472 249 Updated Dec 3, 2024
Jupyter Notebook 21 6 Updated Feb 27, 2017

Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.

Python 54 6 Updated Mar 30, 2022

Comparatively fine-tuning pretrained BERT models on downstream, text classification tasks with different architectural configurations in PyTorch.

Python 126 28 Updated Jul 2, 2020

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Python 7,343 1,294 Updated Mar 16, 2026

Code for our CVPR 2018 paper "Learning Latent Super-Events to Detect Multiple Activities in Videos"

Python 123 34 Updated Oct 4, 2018

We rank the 1st in DSTC8 Audio-Visual Scene-Aware Dialog competition. This is the source code for our IEEE/ACM TASLP (AAAI2020-DSTC8-AVSD) paper "Bridging Text and Video: A Universal Multimodal Tra…

Python 56 14 Updated Jun 12, 2023

A repository of common methods, datasets, and tasks for video research

Python 539 89 Updated Jun 17, 2019

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

Python 514 59 Updated Oct 31, 2020

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Python 542 101 Updated May 1, 2023

Implementation for "Large-scale Pretraining for Visual Dialog" https://arxiv.org/abs/1912.02379

Python 97 19 Updated Mar 31, 2020

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 159,588 32,918 Updated Apr 18, 2026

Starter code in PyTorch for the Visual Dialog challenge

Python 189 38 Updated Mar 24, 2023

Activity Recognition Algorithms for the Charades Dataset

Lua 207 59 Updated Dec 31, 2018