Skip to content
View JaesungHuh's full-sized avatar
🎹
🎹

Block or report JaesungHuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 409 49 Updated Sep 30, 2024

Code for LiFT (Linearized Feature Trajectories) video embedding

Python 22 Updated Dec 4, 2025

Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 2026].

Python 33 3 Updated Mar 10, 2026

Command line utility for forced alignment using Kaldi

Python 1,782 287 Updated Mar 31, 2026

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 114 10 Updated Mar 3, 2026

AI notepad for meetings

Rust 8,134 574 Updated Apr 4, 2026
Python 1,679 192 Updated Nov 15, 2025

Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)

Python 80 9 Updated Feb 27, 2025
Python 23 3 Updated Oct 1, 2025

Foundation Models and Data for Human-Human and Human-AI interactions.

Python 369 29 Updated Dec 13, 2025

The best OSS video generation models, created by Genmo

Python 3,634 478 Updated Nov 14, 2025

AVES: Animal Vocalization Encoder based on Self-Supervision

Python 140 7 Updated Feb 4, 2026

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,515 1,747 Updated Apr 3, 2026

Acoustic impulse response generation using diffusion models

Jupyter Notebook 76 2 Updated Oct 3, 2023

Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)

Python 30 1 Updated Jan 18, 2026

[CVPR2025] Official code for Lost in Translation Found in Context

Python 23 Updated Jan 14, 2026

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Python 6,214 1,102 Updated Jun 19, 2024

Real-Time Face Recognition use SCRFD, ArcFace, ByteTrack and Similarity Measure

Python 193 55 Updated Oct 24, 2024

Audio Large Language Models

Python 900 46 Updated Jul 5, 2025

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 202 4 Updated Feb 25, 2026

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 12,778 1,406 Updated Mar 3, 2026

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,231 151 Updated Mar 12, 2026

Official code for the paper "Scaling Multilingual Visual Speech Recognition"

Python 20 1 Updated Aug 15, 2025

Official code for the paper "Understanding Co-speech Gestures in-the-wild"

Python 22 Updated Oct 31, 2025

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 204 16 Updated Jul 29, 2025
Python 44 4 Updated Feb 5, 2025

Code for ACCV 2024 paper: "3D-Aware Instance Segmentation and Tracking in Egocentric Videos"

Python 12 Updated Jan 28, 2025

👩‍💻👨‍💻 AI 엔지니어 기술 면접 스터디 (⭐️ 2k+)

2,295 508 Updated Mar 17, 2026

Multimodal language model benchmark, featuring challenging examples

Python 187 11 Updated Dec 18, 2024
Next