Lists (32)
Sort Name ascending (A-Z)
academic
acoustic echo cancellation
AIGC
audio codec
audio codecs
audio separation
audio tools
bandwidth extension
beamforming
computer vision
deep learning
diffusion
entertainments
hearing aid
LLM
mircophone array
music tools
noise reduction
packet loss compensation
programming related
simulation tools
singing voice tools
sound source localization
spatial audio
speaker recognition
speech dereverberation
speech diarization
speech frontend
speech recognition
speech separation
speech signal processing
speech voice tools
Starred repositories
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
[ACM MM 2025] AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
A Python Library for Full Reference Binaural Fidelity Testing, Visualization & Feature Generation
Official Repository for "Efficient Vocal Source Separation Through Windowed RoFormer"
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai
Zotero MCP: Connects your Zotero research library with Claude and other AI assistants via the Model Context Protocol to discuss papers, get summaries, analyze citations, and more.
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
kyutai-labs / nanoGPTaudio
Forked from karpathy/nanoGPTCode for the blog "Neural audio codecs: how to get audio into LLMs"
Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark
An interface library for RL post training with environments.
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
中国市场分析脚本是一个功能强大的Python工具,旨在为用户提供对中国A股市场的深入分析。该脚本利用Akshare库从多种数据源获取实时和历史股票数据,并计算关键财务指标,以帮助投资者做出明智的决策。
OpenAI compatible TTS for Sesame CSM:1b & dia:1.6b - Voice Cloning from File/YT
Vogent Turn: fast, open-source turn-detection for Voice AI applications