Skip to content
View rlataewoo's full-sized avatar
  • Korea Electronics Technology Institute (KETI)
  • Seongnam, South Korea
  • 17:22 (UTC +09:00)

Block or report rlataewoo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Python 181 6 Updated Jun 6, 2026

Training code for FAcodec presented in NaturalSpeech3

Python 245 21 Updated Aug 26, 2024

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,813 495 Updated Apr 20, 2025

[ACMMM2025] Official released code for ALLM4ADD

Python 42 3 Updated Oct 30, 2025

Download the MusicCaps dataset for music captioning

Jupyter Notebook 115 11 Updated May 19, 2026
Python 19 1 Updated Apr 10, 2026

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 2,815 265 Updated May 28, 2026

We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).

Python 49 3 Updated Mar 3, 2025

This is the repo of our work titled “Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception”

Python 33 1 Updated Mar 31, 2026

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 212 5 Updated Feb 25, 2026

WeDefense: A Toolkit to Defend Against Fake Audio

Python 30 2 Updated Feb 20, 2026

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Python 7,306 1,182 Updated May 28, 2026
Python 503 36 Updated Oct 16, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,206 8,837 Updated Jun 16, 2026
Python 11 Updated Jan 29, 2025

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 882 76 Updated Aug 27, 2024
Python 273 28 Updated May 19, 2025

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 161 11 Updated Mar 26, 2026

Awesome speech/audio LLMs, representation learning, and codec models

1,231 75 Updated Jun 1, 2026

Very fast, accurate speaker diarization

Python 261 29 Updated Jun 11, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 11,974 1,555 Updated Mar 17, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,915 293 Updated Jan 30, 2026

The repo for INTERSPEECH 2025 MOLEx and Orthogonal loss

Python 4 Updated Dec 1, 2025

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Python 29,869 3,381 Updated Jun 10, 2026

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

Python 137 14 Updated Jun 25, 2024

Demystify AI agents by building them yourself. Local LLMs, no black boxes, real understanding of function calling, memory, and ReAct patterns.

JavaScript 4,275 620 Updated May 31, 2026

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,993 4,082 Updated Jun 16, 2026
Next