LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,103 217 Updated May 19, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,461 435 Updated Sep 14, 2025

moranyanuka / icc_code

[ACL 2024 (Findings)] ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation

Python 5 Updated Aug 24, 2024

Daming-W / TCA-Net-Multimodal-Engagement-Estimation

This is an official pytorch implementation of TCA-Net: Triplet Concatenated-Attentional Network for Multimodal Engagement Estimation.

Python 4 Updated Jun 23, 2024

TempleX98 / MoVA

[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Python 168 4 Updated Sep 25, 2024

TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

Python 938 95 Updated Apr 26, 2025

Daming-W / LLaVA_SU

Forked from haotian-liu/LLaVA

Scenario Understanding with Visual-Question-Answering Base on Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 1 Updated May 24, 2024

1 Updated Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DamingW Daming-W

Block or report Daming-W

Stars

HW-whistleblower / True-Story-of-Pangu

hiyouga / EasyR1

xiaomi-mlab / Orion

haomo-ai / EcoDatum

ProjectVaquitai / Vaquitai

hzjian123 / VLArena

taco-group / OpenEMMA

jmwang0117 / HE-Drive

hiyouga / LLaMA-Factory

NVlabs / OmniDrive

opendilab / LMDrive

ictnlp / LLaMA-Omni

LLaVA-VL / LLaVA-NeXT

moranyanuka / icc_code

Daming-W / TCA-Net-Multimodal-Engagement-Estimation

TempleX98 / MoVA

TinyLLaVA / TinyLLaVA_Factory

Daming-W / LLaVA_SU

Daming-W / Image-Retrieval-From-Text

salesforce / LAVIS

aiiu-lab / CLIPCAM

MulongXie / UI-Captioning

zixian2021 / AI-interview-cards

Daming-W / CS-Notes

Daming-W / Hide-and-Seek

labuladong / fucking-algorithm

Mikoto10032 / DeepLearning