Skip to content
View rlataewoo's full-sized avatar
  • Korea Electronics Technology Institute (KETI)
  • Seongnam, South Korea
  • 02:28 (UTC +09:00)

Block or report rlataewoo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!

Python 701 36 Updated Sep 24, 2025

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python 661 86 Updated Dec 19, 2025

AI-based Audio Watermarking Tool

Python 299 41 Updated Jan 7, 2024

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

Go 10,191 1,095 Updated Dec 25, 2025
Python 73 9 Updated Nov 3, 2025

A library for efficient similarity search and clustering of dense vectors.

C++ 38,531 4,163 Updated Dec 23, 2025

Mixture of Lora Experts

Python 8 Updated Apr 7, 2024
Python 5 Updated Jul 25, 2024

Official code for EnvSDD (Environmental Sound Deepfake Detection)

Python 29 3 Updated Dec 13, 2025
Jupyter Notebook 29 2 Updated Nov 18, 2025

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 215 37 Updated Dec 24, 2025

FastLongSpeech is a novel framework designed to extend the capabilities of Large Speech-Language Models for efficient long-speech processing without necessitating dedicated long-speech training data.

Python 14 1 Updated Jul 22, 2025

SpeechGPT Series: Speech Large Language Models

Python 1,397 95 Updated Jul 22, 2024

Collection of step-by-step playbooks for setting up AI/ML workloads on NVIDIA DGX Spark devices with Blackwell architecture.

TypeScript 272 86 Updated Dec 23, 2025

Text-audio foundation model from Boson AI

Python 7,772 578 Updated Sep 15, 2025

Wav2vec 2.0 Self-Supervised Pretraining

Python 59 9 Updated Feb 6, 2025

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

71,401 8,170 Updated Dec 22, 2025

Contexts Optical Compression

Python 21,573 1,929 Updated Oct 25, 2025

Korean Streaming ASR(with Denoiser and Conformer CTC)

Python 35 7 Updated Apr 28, 2024
Python 48 7 Updated Jul 16, 2025
Jupyter Notebook 102 13 Updated Oct 13, 2025

Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Python 47 7 Updated Sep 20, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,164 193 Updated Oct 9, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 910 87 Updated Sep 20, 2025

한국어 음성인식 STT API 리스트. 각 성능 벤치마크.

471 29 Updated Aug 15, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 17,036 2,052 Updated Dec 2, 2025
Python 5 Updated Apr 30, 2025
Python 6 Updated Apr 23, 2025
Python 28 1 Updated Dec 24, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 641 51 Updated Apr 8, 2025
Next