Skip to content
View kirak-kim's full-sized avatar

Highlights

  • Pro

Block or report kirak-kim

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,374 174 Updated Dec 23, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,557 772 Updated May 27, 2025

Official repository for the 1st DAFx Parameter Estimation Challenge

Python 29 1 Updated Nov 10, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,154 193 Updated Oct 9, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,336 1,450 Updated Nov 28, 2025
HTML 2 Updated Oct 5, 2025
Python 7 Updated Oct 14, 2025

Dataset and evaluation code of ISDrama(ACM-MM 2025): Immersive Spatial Drama Generation through Multimodal Prompting

Python 236 Updated Aug 20, 2025

[Python3] Octave-Band and Fractional Octave-Band filter. For signal in time domain.

Python 83 19 Updated Jul 6, 2023

Implementation of the paper "Can Large Language Models Predict Audio Effects Parameters from Natural Language?"

Python 23 2 Updated May 27, 2025

A PyTorch implementation of the paper "ZigMa: A DiT-Style Mamba-based Diffusion Model" (ECCV 2024)

Python 341 23 Updated Mar 17, 2025

[NeurIPS 2025 Spotlight] Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization

C# 4 1 Updated Nov 7, 2025

A toolbox for skeleton-based action recognition.

Python 1,186 213 Updated Mar 17, 2025

[ICCV 2021] Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis.

Python 87 9 Updated Oct 12, 2021

Code and datasets for 'Few-Shot Audio-Visual Learning of Environment Acoustics' (NeurIPS 2022)

Python 23 5 Updated Nov 20, 2023

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,772 105 Updated Nov 4, 2025

The code for "Graph Diffusion Transformer for Multi-Conditional Molecular Generation"

Python 100 8 Updated Jun 3, 2025

a MUSHRA compliant web audio API based experiment software

JavaScript 406 162 Updated Nov 21, 2025

Differentiable audio signal processors in PyTorch

Python 274 8 Updated Dec 4, 2023

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,643 838 Updated Dec 18, 2025

[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.

113 3 Updated Aug 9, 2025

misuka: A differentiable room acoustic renderer

C++ 34 1 Updated Dec 4, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 81,505 12,189 Updated Dec 21, 2025

Code for Novel View Acoustic Synthesis paper

Python 51 1 Updated Aug 14, 2023

Reference implementation for DPO (Direct Preference Optimization)

Python 2,816 233 Updated Aug 11, 2024

a text-conditional diffusion probabilistic model capable of generating high fidelity audio.

Python 182 22 Updated May 29, 2024

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,194 334 Updated Sep 10, 2025

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Python 2,126 608 Updated Oct 27, 2023

The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Python 56 7 Updated Jul 2, 2025
Next