Stars
聚合多种主流网盘的直链解析下载服务, 一键解析下载,已支持夸克网盘/uc网盘/蓝奏云/蓝奏优享/小飞机盘/123云盘/移动/联通/天翼云/wps等. 支持文件夹分享解析. 体验地址: https://189.qaiu.top
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
⏰ Agenticly track worldwide conference deadlines (Website, Python Cli, Wechat Applet)
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
A web-based collaborative LaTeX editor
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Extract phoneme-level timestamps from speeh audio.
Collection of pretrained models for the Montreal Forced Aligner
Command line utility for forced alignment using Kaldi
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
collection of diffusion model papers categorized by their subareas
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models