We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).

Python 49 3 Updated Mar 3, 2025

xieyuankun / All-Type-ADD

This is the repo of our work titled “Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception”

Python 33 1 Updated Mar 31, 2026

ddlBoJack / MMAR

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 212 5 Updated Feb 25, 2026

zlin0 / wedefense

WeDefense: A Toolkit to Defend Against Fake Audio

Python 30 2 Updated Feb 20, 2026

Liu-Tianchi / Nes2Net_ASVspoof_ITW

Python 58 7 Updated Apr 4, 2026

Lightricks / LTX-2

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Python 7,306 1,182 Updated May 28, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,206 8,837 Updated Jun 16, 2026

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 882 76 Updated Aug 27, 2024

ajd12342 / paraspeechcaps

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 161 11 Updated Mar 26, 2026

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

1,231 75 Updated Jun 1, 2026

narcotic-sh / senko

Very fast, accurate speaker diarization

Python 261 29 Updated Jun 11, 2026

QwenLM / Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 11,974 1,555 Updated Mar 17, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,915 293 Updated Jan 30, 2026

pandarialTJU / MOLEx-ORLoss

The repo for INTERSPEECH 2025 MOLEx and Orthogonal loss

Python 4 Updated Dec 1, 2025

OpenBMB / VoxCPM

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Python 29,869 3,381 Updated Jun 10, 2026

skit-ai / SpeechLLM

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

Python 137 14 Updated Jun 25, 2024

pguso / ai-agents-from-scratch

Demystify AI agents by building them yourself. Local LLMs, no black boxes, real understanding of function calling, memory, and ReAct patterns.

JavaScript 4,275 620 Updated May 31, 2026

verl-project / verl

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,993 4,082 Updated Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taewoo Kim rlataewoo

Block or report rlataewoo

Stars

cwx-worst-one / WavTTS

Plachtaa / FAcodec

Plachtaa / seed-vc

ucas-hao / qwen_audio_for_add

Jessegator / Audio_robustness_evaluation

nateraw / download-musiccaps-dataset

xieyuankun / AT-ADD-Baseline

adefossez / demucs

zxzhao0 / C2SER