Stars
Long-form streaming TTS system for multi-speaker dialogue generation
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
NeurIPS2023 - A generic biosignal learning framework. Large EEG pre-trained models.
Official codebase for "Brain-JEPA: Brain Dynamics Foundation Model with Gradient Positioning and Spatiotemporal Masking" (NeurIPS 2024, Spotlight).
[ICLR 2025] CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
[ICLR 2024 spotlight] Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI
Deep learning software to decode EEG, ECG or MEG signals
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
💯2025年 软件设计师 (软考中级)备考资源库+配套免费刷题软件。https://ruankaodaren.com
Simple software for downloading podcasts
VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)
下载指定 B 站 UP 主全部或指定范围的音频,支持多种合集。A script to download all audios of the Bilibili uploader you love.
A feature-rich command-line audio/video downloader
👾 Fast and simple video download library and CLI tool written in Go
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
A generative speech model for daily dialogue.
Text-audio foundation model from Boson AI