Skip to content
View cyhuang-tw's full-sized avatar

Highlights

  • Pro

Block or report cyhuang-tw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 14,539 1,365 Updated Jan 31, 2026

Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。

Python 269 15 Updated Feb 3, 2026

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 2,067 136 Updated Dec 8, 2025

End-to-End Speech Processing Toolkit

Python 4 3 Updated Feb 2, 2026

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,329 95 Updated Sep 22, 2025
Python 2 Updated Mar 22, 2025

Code for DeSTA2.5-Audio, general-purpose LALM

Python 128 7 Updated Jan 23, 2026

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 26,449 1,869 Updated Jan 9, 2026

Versatile Evaluation of Speech and Audio

Python 383 46 Updated Dec 9, 2025

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 123 4 Updated Sep 21, 2025

real time face swap and one-click video deepfake with only a single image

Python 79,226 11,549 Updated Dec 15, 2025

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 97,660 10,801 Updated Jan 19, 2026

VoiceStar: Robust, Duration-controllable TTS that can Extrapolate

Python 307 27 Updated May 31, 2025

Audio Large Language Models

Python 860 43 Updated Jul 5, 2025

SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"

Python 228 14 Updated May 18, 2025
Python 4,609 373 Updated Jan 30, 2026

深度学习经典、新论文逐段精读

32,521 2,775 Updated Mar 22, 2025

End-to-End Speech Processing Toolkit

Python 9,715 2,379 Updated Feb 4, 2026

A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation

Jupyter Notebook 140 14 Updated Nov 30, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 670 49 Updated Jun 5, 2025

🧑‍🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.

7,473 723 Updated Feb 3, 2026

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

HTML 120 10 Updated Jul 15, 2025

Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.

Python 852 45 Updated Jan 23, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 349 48 Updated Jul 21, 2025

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 239 13 Updated Dec 18, 2025

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 415 29 Updated Sep 15, 2025

Python 3.8+ toolbox for submitting jobs to Slurm

Python 1,573 146 Updated Jan 14, 2026

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 400 35 Updated Sep 11, 2023

🙌 OpenHands: AI-Driven Development

Python 67,452 8,395 Updated Feb 3, 2026
Next