Stars
Automatic Depression Detection: a GRU/ BiLSTM-based Model and An Emotional Audio-Textual Corpus
We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database. ACL 2022
Sparse Adapter Fusion for Continual Learning in NLP - EACL 2026
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing - ACL Findings 2026
Prototype Conditioned Generative Replay for Continual Learning in NLP - NAACL 2025
A curated list of papers and resources based on the survey "Agentic Reasoning for Large Language Models"
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
It's a repository for implementations of neural speech editing algorithms.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
A feature-rich command-line audio/video downloader
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
Dialogue model that produces empathetic responses when trained on the EmpatheticDialogues dataset.
Simple text to phones converter for multiple languages
Vector (and Scalar) Quantization, in Pytorch
Uni-MoE: Lychee's Large Multimodal Model Family.
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.
[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
MiniMax-M2, a model built for Max coding & agentic workflows.