Skip to content
View kirak-kim's full-sized avatar

Highlights

  • Pro

Block or report kirak-kim

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
232 results for source starred repositories
Clear filter

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,399 177 Updated Dec 23, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,558 772 Updated May 27, 2025

Official repository for the 1st DAFx Parameter Estimation Challenge

Python 29 1 Updated Nov 10, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,157 193 Updated Oct 9, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,346 1,450 Updated Nov 28, 2025
HTML 2 Updated Oct 5, 2025
Python 7 Updated Oct 14, 2025

Dataset and evaluation code of ISDrama(ACM-MM 2025): Immersive Spatial Drama Generation through Multimodal Prompting

Python 236 Updated Aug 20, 2025

[Python3] Octave-Band and Fractional Octave-Band filter. For signal in time domain.

Python 83 19 Updated Jul 6, 2023

Implementation of the paper "Can Large Language Models Predict Audio Effects Parameters from Natural Language?"

Python 23 2 Updated May 27, 2025

A PyTorch implementation of the paper "ZigMa: A DiT-Style Mamba-based Diffusion Model" (ECCV 2024)

Python 341 23 Updated Mar 17, 2025

[NeurIPS 2025 Spotlight] Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization

C# 4 1 Updated Nov 7, 2025

A toolbox for skeleton-based action recognition.

Python 1,185 213 Updated Mar 17, 2025

[ICCV 2021] Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis.

Python 87 9 Updated Oct 12, 2021

Code and datasets for 'Few-Shot Audio-Visual Learning of Environment Acoustics' (NeurIPS 2022)

Python 23 5 Updated Nov 20, 2023

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,773 105 Updated Nov 4, 2025

The code for "Graph Diffusion Transformer for Multi-Conditional Molecular Generation"

Python 100 8 Updated Jun 3, 2025

a MUSHRA compliant web audio API based experiment software

JavaScript 406 162 Updated Nov 21, 2025

Differentiable audio signal processors in PyTorch

Python 274 8 Updated Dec 4, 2023

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,647 838 Updated Dec 18, 2025

[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.

113 3 Updated Aug 9, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 81,513 12,192 Updated Dec 21, 2025

Reference implementation for DPO (Direct Preference Optimization)

Python 2,817 233 Updated Aug 11, 2024

a text-conditional diffusion probabilistic model capable of generating high fidelity audio.

Python 182 22 Updated May 29, 2024

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,194 334 Updated Sep 10, 2025

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Python 2,126 608 Updated Oct 27, 2023

The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Python 56 7 Updated Jul 2, 2025

A list of datasets made available by members of the Aalto Acoustics Lab

29 1 Updated Sep 6, 2024

Open repository of simulated Room Impulse Responses (RIR) accompanying the paper "Hearing Anywhere in Any Environment"

67 1 Updated Aug 11, 2025
Next