Stars
DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
A framework for efficient model inference with omni-modality models
🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手 - 视频字幕生成、断句、校正、字幕翻译全流程处理!- A powered tool for easy and efficient video subtitling.
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
Edit Banana: A framework for converting statistical formats into editable.
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
本科华五,曾赴美qs50读博,某兄弟院校副教授,校园门卫亭女性主理人,为防止炸号的备份平台,是本人。
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Reference-aware automatic speech evaluation toolkit
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
⚡ Clash for Lab 是为实验室环境设计的科学上网工具,无需sudo权限,优雅地一键式脚本安装
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.