ztw1123

Follow

ztw ztw1123

Follow

Stars

wangkevin02 / USP

This repository contains the code and pre-trained models for our paper

Python 22 5 Updated Jun 29, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,976 2,411 Updated Apr 2, 2026

aim-uofa / Omni-R1

[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Python 119 5 Updated Dec 3, 2025

HumanMLLM / R1-Omni

Python 1,011 72 Updated Mar 24, 2025

butterfliesss / EmpRL

Python 12 1 Updated Mar 2, 2025

Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 12,918 2,793 Updated Jun 22, 2025

JunyaoHu / common_metrics_on_video_quality

You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.

Python 566 25 Updated Jan 17, 2026

microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

Python 1,401 452 Updated Jul 25, 2024

huggingface / trl

Train transformer language models with reinforcement learning.

Python 17,998 2,632 Updated Apr 11, 2026

HuiResearch / FlashTTS

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

Python 599 77 Updated May 18, 2025

jfzhouyoo / Supporter

The implementation of our paper accepted by ACL 2023: Facilitating Multi-turn Emotional Support Conversation with Positive Emotion Elicitation: A Reinforcement Learning Approach

Python 23 2 Updated Jul 16, 2023

Vchitect / VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Python 1,573 110 Updated Mar 23, 2026

PaperDebugger / paperdebugger

A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

TypeScript 1,436 71 Updated Apr 10, 2026

tanshuai0219 / style2talker

[AAAI 2024] stle2talker - Official PyTorch Implementation

Python 52 9 Updated Aug 6, 2025

baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Python 5,672 506 Updated Jul 18, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

17,595 1,123 Updated Apr 9, 2026

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,814 178 Updated Apr 9, 2026

BytedanceSpeech / seed-tts-eval

Python 1,551 143 Updated Jun 14, 2024

OpenBMB / MiniCPM-o

A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone

Python 24,344 1,901 Updated Apr 1, 2026

NVlabs / OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 652 51 Updated Feb 26, 2026

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 20,499 2,350 Updated Mar 16, 2026

mseitzer / pytorch-fid

Compute FID scores with PyTorch.

Python 3,838 527 Updated Jul 3, 2024

pliang279 / awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

6,857 898 Updated Aug 20, 2024

ttslr / awesome-multimodal-ml

Forked from pliang279/awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

3 Updated Jun 22, 2023

AI-S2-Lab / UniTalker

[ACMMM'2025] UniTalker: Conversational Speech-Visual Synthesis

5 Updated Jul 5, 2025

JosephPai / Awesome-Talking-Face

📖 A curated list of resources dedicated to talking face.

1,540 122 Updated Dec 23, 2024

lochenchou / MOSNet

Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Python 379 67 Updated Jul 21, 2024

joonson / syncnet_python

Out of time: automated lip sync in the wild

Python 881 192 Updated Apr 11, 2026

fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 8,642 1,120 Updated Sep 14, 2024

fishaudio / fish-speech

SOTA Open Source TTS

Python 29,235 2,461 Updated Apr 6, 2026