-
The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen)
- https://www.zhangxueyao.com/
Highlights
- Pro
Stars
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
Train transformer language models with reinforcement learning.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems
Code for ICML2020 paper - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Inference and training library for high-quality TTS models.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
The official GitHub page for the survey paper "Foundation Models for Music: A Survey".
A library for speech data augmentation in time-domain
Diffusion Model for Voice Conversion
PolySinger: Singing-Voice to Singing-Voice Translation From English to Japanese
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
This is the GitHub page for publicly available emotional speech data.
Public Code for Neural Codec Language Models for Disentangled and Textless Voice Conversion (Interspeech 2024)