Skip to content
View hyunbin70's full-sized avatar

Block or report hyunbin70

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR’26] Official PyTorch Implementation of “HDR-NSFF: High Dynamic Range Neural Scene Flow Fields“

Python 36 Updated Apr 19, 2026

VGGSounder, a multi-label audio-visual classification dataset with modality annotations.

Jupyter Notebook 16 Updated Jun 3, 2026

open-sourced video dataset with dynamic scenes and camera movements annotation

Python 94 1 Updated Apr 24, 2025

[ICCV 2025] A simple training-free approach adapting DUSt3R for dynamic scenes.

Python 530 26 Updated Apr 1, 2025

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

1,144 96 Updated Dec 15, 2025

Official Repo For Pixel-LLM Codebase: Sa2VA (Arxiv-25), SAMTok (CVPR-26), VRT, SaSaSa2VA (1-st solution for LSVOS)

Python 1,614 118 Updated Jun 15, 2026

✨✨Latest Advances on Multimodal Large Language Models

17,899 1,128 Updated Jun 18, 2026

[NeurIPS 2024] Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba

Python 183 10 Updated Feb 1, 2026

Simulation platform for general-purpose robotics & embodied AI learning.

Python 29,372 2,786 Updated Jun 17, 2026

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,210 259 Updated Feb 23, 2026

[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 2,137 82 Updated Dec 12, 2025

Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"

Python 1,318 79 Updated Jan 5, 2026

Official repository for "MMM: Generative Masked Motion Model" (CVPR 2024 -- Highlight)

Jupyter Notebook 132 14 Updated Jul 5, 2025

[ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding.

Jupyter Notebook 255 15 Updated May 5, 2024

[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Python 48 4 Updated Sep 6, 2024

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 1 Updated May 6, 2024

Code for the paper Physics-as-Inverse-Graphics: Joint Unsupervised Learning of Objects and Physics from Video

Python 41 11 Updated May 22, 2023

A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.

Jupyter Notebook 2,757 275 Updated May 21, 2026

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

760 43 Updated May 21, 2026

SAiD: Blendshape-based Audio-Driven Speech Animation with Diffusion

Python 135 21 Updated Jan 25, 2024

📖 A curated list of resources dedicated to talking face.

1,541 121 Updated Dec 23, 2024

Implementation of Korean FastSpeech2

Python 215 51 Updated Jan 29, 2023

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 117,489 13,738 Updated Jun 18, 2026

My collection of machine learning papers

299 21 Updated Aug 10, 2023

Summary of publicly available ressources such as code, datasets, and scientific papers for the FLAME 3D head model

678 34 Updated Mar 3, 2026

A curated list of audio-visual learning methods and datasets.

289 19 Updated Dec 3, 2024

The repo for studying and sharing diffusion models.

427 34 Updated Aug 8, 2023

[ICCV 2023] Understanding 3D Object Interaction from a Single Image

Python 47 4 Updated Feb 29, 2024

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Python 6,964 512 Updated Dec 13, 2025
Next