Skip to content
View ztw1123's full-sized avatar

Block or report ztw1123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository contains the code and pre-trained models for our paper

Python 22 5 Updated Jun 29, 2025

Fully open reproduction of DeepSeek-R1

Python 25,976 2,411 Updated Apr 2, 2026

[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Python 119 5 Updated Dec 3, 2025
Python 1,011 72 Updated Mar 24, 2025
Python 12 1 Updated Mar 2, 2025

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 12,918 2,793 Updated Jun 22, 2025

You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.

Python 566 25 Updated Jan 17, 2026

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

Python 1,401 452 Updated Jul 25, 2024

Train transformer language models with reinforcement learning.

Python 17,998 2,632 Updated Apr 11, 2026

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

Python 599 77 Updated May 18, 2025

The implementation of our paper accepted by ACL 2023: Facilitating Multi-turn Emotional Support Conversation with Positive Emotion Elicitation: A Reinforcement Learning Approach

Python 23 2 Updated Jul 16, 2023

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Python 1,573 110 Updated Mar 23, 2026

A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

TypeScript 1,436 71 Updated Apr 10, 2026

[AAAI 2024] stle2talker - Official PyTorch Implementation

Python 52 9 Updated Aug 6, 2025

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Python 5,672 506 Updated Jul 18, 2024

✨✨Latest Advances on Multimodal Large Language Models

17,595 1,123 Updated Apr 9, 2026

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,814 178 Updated Apr 9, 2026

A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone

Python 24,344 1,901 Updated Apr 1, 2026

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 652 51 Updated Feb 26, 2026

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 20,499 2,350 Updated Mar 16, 2026

Compute FID scores with PyTorch.

Python 3,838 527 Updated Jul 3, 2024

Reading list for research topics in multimodal machine learning

6,857 898 Updated Aug 20, 2024

Reading list for research topics in multimodal machine learning

3 Updated Jun 22, 2023

[ACMMM'2025] UniTalker: Conversational Speech-Visual Synthesis

5 Updated Jul 5, 2025

📖 A curated list of resources dedicated to talking face.

1,540 122 Updated Dec 23, 2024

Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Python 379 67 Updated Jul 21, 2024

Out of time: automated lip sync in the wild

Python 881 192 Updated Apr 11, 2026

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 8,642 1,120 Updated Sep 14, 2024

SOTA Open Source TTS

Python 29,235 2,461 Updated Apr 6, 2026
Next