Skip to content
View jihoojung0106's full-sized avatar
🏠
Working from home
🏠
Working from home

Highlights

  • Pro

Block or report jihoojung0106

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.

HTML 62 1 Updated Jan 24, 2024

Official implementation for paper How Can Objects Help Video-Language Understanding

Python 8 Updated Mar 23, 2026

MISP-Meeting Dataset & Code

Python 3 2 Updated Jan 11, 2026

The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.

136 10 Updated Dec 4, 2023

[ICMI 2024] SEMPI: A Database for Understanding Social Engagement in Video-Mediated Multiparty Interaction

Python 7 2 Updated Dec 27, 2024

D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning

Python 10 Updated Feb 11, 2026
HTML 5 Updated Jul 22, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

2,909 127 Updated Apr 4, 2026
Python 76 5 Updated Jul 28, 2025
2 Updated Mar 14, 2026

[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga

Python 146 6 Updated Jan 19, 2026

Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025

Python 25 1 Updated Jan 26, 2025

[AAAI 2025 Oral] Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning

Python 8 Updated May 9, 2025

[CVPR 2025] 🔥 Official impl. of "Audio-Visual Instance Segmentation".

Python 48 5 Updated Jun 5, 2025

[ICLR 2026 Oral] SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Python 9 4 Updated Apr 8, 2026

https://avocado-captioner.github.io/

Python 32 1 Updated Oct 16, 2025

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 647 56 Updated Mar 17, 2026

This repository contains low-bit quantization papers from 2020 to 2025 on top conference.

146 5 Updated Mar 5, 2026
Python 14 1 Updated Jul 19, 2025
Python 4 Updated Mar 26, 2026

[WACV 2026] LASER: Lip Landmark Assisted Speaker Detection for Robustness official implemntation

Python 25 3 Updated Feb 26, 2026

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Python 463 100 Updated Oct 23, 2023

Identifying "who speak when" using visual speech input and pretrained lip-sync expert

Python 15 Updated Jul 1, 2023

code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection

Python 53 5 Updated May 1, 2023

The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)

Python 102 23 Updated Mar 23, 2025

The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)

Python 170 21 Updated Mar 23, 2025

EMER, OV-MER (ICML25), AffectGPT (ICML25, Oral), EmoPrefer (ICLR26)

Python 372 35 Updated Feb 24, 2026

A carefully curated collection of high-quality libraries, projects, tutorials, research papers, and other essential resources focused on Mechanistic Interpretability, a growing subfield in machine …

JavaScript 70 4 Updated Apr 10, 2026

[ICLR 2026] Official implementation of the paper "Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs"

Python 21 Updated Mar 3, 2026
Next