-
Shanghai Jiao Tong University & Shanghai Innovation Institute
- Shanghai
-
16:50
(UTC +08:00) - https://zhikangniu.github.io/
Lists (28)
Sort Name ascending (A-Z)
ASR
Awesome List
Bench
Chinese LLM
Codec
CV
Dataset/Tools/Course
Diffusion
emotion
Framework
front
LLM
Music Generation
nano
nlp
other
pipeline
Podcast
PyTorch
RLHF
s2st
speaker diarization
T2V
TTS
tutorial
unify
V2A
Vocoder
Stars
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
⚡ Clash for Lab 是为实验室环境设计的科学上网工具,无需sudo权限,优雅地一键式脚本安装
A collection of awesome think with videos papers.
SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
A pioneering unified platform designed to systematize and accelerate deep learning research in spectroscopy.
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
An Open-Source Project to Unify Audio Processing and Generation
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…
Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune
A curated list of vibe coding references, collaborating with AI to write code.
kyutai-labs / nanoGPTaudio
Forked from karpathy/nanoGPTCode for the blog "Neural audio codecs: how to get audio into LLMs"
Trainging, inference, and testing of the SAC speech codec model.
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.