Stars
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
[ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"
The official implementation of [Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation] in AAAI2025.
Closed-loop evaluation for end-to-end VLM autonomous driving agent
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
HE-Drive: Human-Like End-to-End Driving with Vision Language Models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
[ACL 2024 (Findings)] ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
This is an official pytorch implementation of TCA-Net: Triplet Concatenated-Attentional Network for Multimodal Engagement Estimation.
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
A Framework of Small-scale Large Multimodal Models
Daming-W / LLaVA_SU
Forked from haotian-liu/LLaVAScenario Understanding with Visual-Question-Answering Base on Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
This research project has constructed a two-stage clustering-based retrieval framework, as well as a deep learning-based retrieval algorithm using the CLIP model, which demonstrates zero-shot abili…
LAVIS - A One-stop Library for Language-Vision Intelligence
Daming-W / CS-Notes
Forked from CyC2018/CS-Notes📚 技术面试必备基础知识、Leetcode、计算机操作系统、计算机网络、系统设计
This research presents probabilistic machine learning methods to optimize basic image ranking models. And make it suitable for visual navigation and exploration tasks in the real complex world.
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
深度学习入门教程, 优秀文章, Deep Learning Tutorial