-
UESTC PhD, TJU Master's
Lists (6)
Sort Name ascending (A-Z)
Starred repositories
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
An open source implementation of CLIP.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
An open-source NLP research library, built on PyTorch.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
PyTorch package for the discrete VAE used for DALL·E.
Python bindings for FFmpeg - with complex filtering support
🐍 Geometric Computer Vision Library for Spatial AI
Speech recognition module for Python, supporting several engines and APIs, online and offline.
ImageBind One Embedding Space to Bind Them All
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Chinese version of GPT2 training code, using BERT tokenizer.
A Collection of Variational Autoencoders (VAE) in PyTorch.
🔥 2D and 3D Face alignment library build using pytorch
Multilingual Voice Understanding Model
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Pretrained Pytorch face detection (MTCNN) and facial recognition (InceptionResnet) models
Production First and Production Ready End-to-End Speech Recognition Toolkit
Denoising Diffusion Probabilistic Models
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark