Skip to content
View BradyFU's full-sized avatar
👋
👋

Organizations

@VITA-MLLM @MME-Benchmarks

Block or report BradyFU

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A curated paper list and taxonomy of efficient Vision-Language-Action (VLA) models for embodied manipulation.

60 1 Updated Nov 10, 2025

QeRL enables RL for 32B LLMs on a single H100 GPU.

Python 469 45 Updated Nov 27, 2025

The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.

Python 135 2 Updated Oct 28, 2025

LongLive: Real-time Interactive Long Video Generation

Python 917 63 Updated Dec 4, 2025

Think Beyond Images

Python 537 34 Updated Sep 23, 2025

Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.

159 6 Updated Jun 17, 2025

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

25 Updated Dec 21, 2025

✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Python 272 22 Updated May 9, 2025

✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Python 667 60 Updated May 24, 2025

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Python 41 4 Updated Apr 10, 2025

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Python 211 9 Updated Sep 26, 2025

Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation

Jupyter Notebook 31 1 Updated Mar 28, 2025

✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"

Python 366 35 Updated Oct 28, 2025

✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension"

Python 73 2 Updated Apr 28, 2025

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Python 1,465 94 Updated Sep 11, 2025

The Next Step Forward in Multimodal LLM Alignment

Python 192 8 Updated May 1, 2025

MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency

Python 135 5 Updated Aug 5, 2025

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Python 56 3 Updated Apr 14, 2025

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Python 307 29 Updated May 14, 2025

Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"

48 Updated Sep 3, 2025

[MM 2025] A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

Python 40 4 Updated Oct 14, 2025

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 358 24 Updated May 27, 2025

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Python 912 48 Updated Oct 25, 2025

✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Python 148 11 Updated Oct 21, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,463 180 Updated Mar 28, 2025

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024

Python 86 9 Updated Sep 30, 2024

✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Python 163 7 Updated Dec 26, 2024

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

209 8 Updated Apr 3, 2025

Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)

Python 163 Updated Nov 29, 2024

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

699 25 Updated Dec 8, 2025
Next