Stars
This repository contains the code and pre-trained models for our paper
Fully open reproduction of DeepSeek-R1
[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Train transformer language models with reinforcement learning.
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
The implementation of our paper accepted by ACL 2023: Facilitating Multi-turn Emotional Support Conversation with Positive Emotion Elicitation: A Reinforcement Learning Approach
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing
[AAAI 2024] stle2talker - Official PyTorch Implementation
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
✨✨Latest Advances on Multimodal Large Language Models
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Reading list for research topics in multimodal machine learning
Reading list for research topics in multimodal machine learning
[ACMMM'2025] UniTalker: Conversational Speech-Visual Synthesis
📖 A curated list of resources dedicated to talking face.
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Out of time: automated lip sync in the wild
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation