-
Korea Electronics Technology Institute (KETI)
- Seongnam, South Korea
-
17:22
(UTC +09:00)
Stars
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling
Training code for FAcodec presented in NaturalSpeech3
zero-shot voice conversion & singing voice conversion, with real-time support
[ACMMM2025] Official released code for ALLM4ADD
Download the MusicCaps dataset for music captioning
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).
This is the repo of our work titled “Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception”
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
WeDefense: A Toolkit to Defend Against Fake Audio
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
Awesome speech/audio LLMs, representation learning, and codec models
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
The repo for INTERSPEECH 2025 MOLEx and Orthogonal loss
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
Demystify AI agents by building them yourself. Local LLMs, no black boxes, real understanding of function calling, memory, and ReAct patterns.
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework