gmltmd789

Heeseung gmltmd789

Ph.D. Candidate at Seoul National University, Republic of Korea. Interested in Spoken Language Model, Speech Synthesis, and Generative Model

22 followers · 14 following

Seoul National University
Seoul, Republic of Korea
10:29 (UTC +09:00)
gmltmd789.github.io
https://scholar.google.com/citations?user=4ojbJpoAAAAJ&hl=ko
in/gmltmd789

Achievements

Stars

Tencent / VITA

The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.

Python 135 2 Updated Oct 28, 2025

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 19,297 2,059 Updated Oct 21, 2025

ictnlp / LLaMA-Omni2

Python 250 26 Updated May 19, 2025

OpenMOSS / MOSS-Speech

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Python 112 5 Updated Dec 4, 2025

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 18,944 2,096 Updated Dec 17, 2025

runamu / compositional-conservatism

An official implementation of "Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning".

Python 7 Updated Apr 30, 2024

VITA-MLLM / VITA-Audio

✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Python 669 60 Updated May 24, 2025

neuphonic / neucodec

A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.

Python 135 17 Updated Oct 7, 2025

maitrix-org / Voila

Python 481 43 Updated May 6, 2025

dwsmart32 / arxiv2notion

Python 6 2 Updated Dec 8, 2025

12kimih / HiCUPID

[ACL 2025] Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis

Python 12 Updated Jun 3, 2025

dllm-reasoning / d1

Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"

Python 389 48 Updated Dec 20, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,401 321 Updated Jun 21, 2025

kanishkg / cognitive-behaviors

Python 218 12 Updated Mar 26, 2025

kyutai-labs / moshi-finetune

Python 344 43 Updated Oct 3, 2025

jonflynng / qwen2-audio-finetune

Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.

Jupyter Notebook 22 1 Updated Nov 23, 2024

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 303 Updated Jun 12, 2025

mbzuai-oryx / LLMVoX

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Python 291 38 Updated May 16, 2025

chaehunshin / DiptychPrompting

Python 57 2 Updated Mar 22, 2025

ajd12342 / paraspeechcaps

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 148 9 Updated Mar 24, 2025

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,425 230 Updated Nov 12, 2025

baichuan-inc / Baichuan-Audio

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

Python 215 12 Updated Feb 28, 2025

VITA-MLLM / LUCY

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Python 57 3 Updated Apr 14, 2025

L0SG / L0SG.github.io

HTML 3 Updated Jan 5, 2025

OpenMOSS / SpeechGPT-2.0-preview

GPT-4o-level, real-time spoken dialogue system.

Python 363 29 Updated Jan 27, 2025

12kimih / self-refine

Re-implementation of Self-Refine

Python 1 Updated Aug 19, 2024

deepseek-ai / DeepSeek-R1

91,593 11,773 Updated Jun 27, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 336 45 Updated Jul 21, 2025

VideoVerses / VideoVAEPlus

[ICCV 2025] VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 376 12 Updated Jan 19, 2025

vivian556123 / NeurIPS2024-CoVoMix

Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Python 62 3 Updated Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heeseung gmltmd789

Achievements

Achievements

Block or report gmltmd789

Stars

Tencent / VITA

m-bain / whisperX

ictnlp / LLaMA-Omni2

OpenMOSS / MOSS-Speech

microsoft / VibeVoice

runamu / compositional-conservatism

VITA-MLLM / VITA-Audio

neuphonic / neucodec

maitrix-org / Voila

dwsmart32 / arxiv2notion

12kimih / HiCUPID

dllm-reasoning / d1

MoonshotAI / Kimi-Audio

kanishkg / cognitive-behaviors

kyutai-labs / moshi-finetune

jonflynng / qwen2-audio-finetune

QwenLM / Qwen2.5-Omni

mbzuai-oryx / LLMVoX

chaehunshin / DiptychPrompting

ajd12342 / paraspeechcaps

ML-GSAI / LLaDA

baichuan-inc / Baichuan-Audio

VITA-MLLM / LUCY

L0SG / L0SG.github.io

OpenMOSS / SpeechGPT-2.0-preview

12kimih / self-refine

deepseek-ai / DeepSeek-R1

zhenye234 / X-Codec-2.0

VideoVerses / VideoVAEPlus

vivian556123 / NeurIPS2024-CoVoMix