-
AIRIS Lab, KAIST
- Daejeon, South Korea
- https://www.kirak.kim
- @_kirak_kim
Highlights
- Pro
Stars
Twitch VOD/Clip Downloader - Chat Download/Render/Replay
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
VLA-GSE: Boosting Parameter Efficient Finetuning in VLA with Generalized and Specialized Experts
Official repository of LIBERO-plus, a generalized benchmark for in-depth robustness analysis of vision-language-action models.
Zxy-MLlab / LIBERO-PRO
Forked from Lifelong-Robot-Learning/LIBEROLIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark
A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple tasks and applications.
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
Isaac-GR00T for RoboCasa Benchmark
Public release of the Sound Effect Foundation model by Sony AI.
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
OmniCodec: Low Frame Rate Universal Audio Codec with Semantic–Acoustic Disentanglement
[CVPR 2026] FLAC: Few-Shot Acoustic Synthesis with Flow Matching. FLAC enables RIR generation in novel scenes using only one-shot acoustic observation. The repository also provides AGREE, a joint e…
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
DreamGen: Nvidia GEAR Lab's initiative to solve the robotics data problem using world models
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
A structured reading list on Vision-Language-Action (VLA) models — from diffusion/flow matching foundations through state-of-the-art robot foundation model architectures to data scaling, RL fine-tu…
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.
A comprehensive list of papers about dual-system VLA models, including papers, codes, and related websites.
OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
NVIDIA Isaac GR00T N1.7 - A Foundation Model for Generalist Robots.