-
Korea Electronics Technology Institute (KETI)
- Seongnam, South Korea
-
02:28
(UTC +09:00)
Stars
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
A library for efficient similarity search and clustering of dense vectors.
Official code for EnvSDD (Environmental Sound Deepfake Detection)
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
FastLongSpeech is a novel framework designed to extend the capabilities of Large Speech-Language Models for efficient long-speech processing without necessitating dedicated long-speech training data.
SpeechGPT Series: Speech Large Language Models
Collection of step-by-step playbooks for setting up AI/ML workloads on NVIDIA DGX Spark devices with Blackwell architecture.
Text-audio foundation model from Boson AI
Wav2vec 2.0 Self-Supervised Pretraining
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Korean Streaming ASR(with Denoiser and Conformer CTC)
Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
MiMo-Audio: Audio Language Models are Few-Shot Learners
한국어 음성인식 STT API 리스트. 각 성능 벤치마크.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis