Skip to content
View ZhikangNiu's full-sized avatar
🎯
focus
🎯
focus

Block or report ZhikangNiu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 7 Updated Dec 19, 2025
Python 81 13 Updated Dec 18, 2025

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 742 89 Updated Dec 17, 2025

Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"

Python 72 1 Updated Dec 5, 2025
Jupyter Notebook 229 35 Updated Oct 13, 2025

⚡ Clash for Lab 是为实验室环境设计的科学上网工具,无需sudo权限,优雅地一键式脚本安装

Shell 244 12 Updated Dec 11, 2025

A collection of awesome think with videos papers.

73 2 Updated Dec 1, 2025

SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀

Python 17 1 Updated May 20, 2025

5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs

Python 55 9 Updated Nov 19, 2025

A pioneering unified platform designed to systematize and accelerate deep learning research in spectroscopy.

Python 25 4 Updated Nov 13, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,488 213 Updated Dec 16, 2025

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Python 1,896 367 Updated Dec 19, 2025
Jupyter Notebook 28 6 Updated Oct 28, 2025

An Open-Source Project to Unify Audio Processing and Generation

HTML 71 7 Updated Dec 9, 2025
Python 78 Updated Oct 18, 2025

Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"

Jupyter Notebook 165 4 Updated Dec 17, 2025

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 407 28 Updated Nov 27, 2025

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Python 107 5 Updated Oct 30, 2025
Python 34 1 Updated Nov 4, 2025

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 196 13 Updated Sep 21, 2025

Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"

Python 160 5 Updated Jan 31, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,742 337 Updated Dec 11, 2025

Official Repository of UltraVoice

JavaScript 52 1 Updated Oct 28, 2025

Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune

Python 93 10 Updated Sep 27, 2025

A curated list of vibe coding references, collaborating with AI to write code.

2,028 230 Updated Dec 18, 2025

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 139 3 Updated Oct 20, 2025

Trainging, inference, and testing of the SAC speech codec model.

Python 92 6 Updated Nov 1, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

768 41 Updated Oct 10, 2025
Next