Stars
A python API for reading and writing SOFA files (https://www.sofaconventions.org/)
A Python library aimed at acousticians.
官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project
Official repository of the work "Speaker Distance Estimation in Enclosures from Single-Channel Audio" published to IEEE/ACM Transactions on Audio, Speech, and Language Processing.
This is the official implementation of reverberant speech to room impulse response estimator
Code for the paper "RIR-in-a-Box : Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation" presented at Interspeech 2024.
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Model for selecting perceptually relevant early reflections for parametric spatial sound rendering
Impulse response generation based on state-of-the-art geometric sound propagation engine.
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
A collection of projects showcasing RAG, agents, workflows, and other AI use cases
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Production-ready platform for agentic workflow development.
实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的效果
SuperSonic is the next-generation AI+BI platform that unifies Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms.
The reproduced code for Google's SoundStorm
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A high-throughput and memory-efficient inference and serving engine for LLMs
🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案,支持 OpenAI,Midjourney,Claude,讯飞星火,Stable Diffusion,DALL·E,ChatGLM,通义千问,腾讯混元,360 智脑,百川 AI,火山方舟,新必应,Gemini,Moonshot …
The official implementation of GTCRN, an ultra-lightweight SE model.
Efficient Multimodal Large Language Models: A Survey
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
A generative speech model for daily dialogue.
Control adaptive filters with neural networks.