-
Carnegie Mellon University
- Pittsburgh, PA, USA
- cyhuang-tw.github.io
- https://orcid.org/0000-0003-4927-1293
- @cyhuang_tw
Highlights
- Pro
Stars
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
jctian98 / espnet
Forked from espnet/espnetEnd-to-End Speech Processing Toolkit
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Code for DeSTA2.5-Audio, general-purpose LALM
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
real time face swap and one-click video deepfake with only a single image
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
VoiceStar: Robust, Duration-controllable TTS that can Extrapolate
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation
Unified automatic quality assessment for speech, music, and sound.
🧑🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
Python 3.8+ toolbox for submitting jobs to Slurm
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation