Skip to content
View JaesungHuh's full-sized avatar
🎹
🎹

Block or report JaesungHuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 392 50 Updated Sep 30, 2024

Code for LiFT (Linearized Feature Trajectories) video embedding

Python 11 Updated Dec 4, 2025

Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models".

Python 27 2 Updated Nov 13, 2025

Command line utility for forced alignment using Kaldi

Python 1,698 273 Updated Nov 15, 2025

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 91 5 Updated Oct 15, 2025

Local-first AI Notepad for Private Meetings

TypeScript 7,219 448 Updated Dec 21, 2025
Python 1,456 152 Updated Nov 15, 2025

Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)

Python 77 7 Updated Feb 27, 2025
Python 12 2 Updated Oct 1, 2025

Foundation Models and Data for Human-Human and Human-AI interactions.

Python 324 21 Updated Dec 13, 2025

The best OSS video generation models, created by Genmo

Python 3,538 468 Updated Nov 14, 2025

AVES: Animal Vocalization Encoder based on Self-Supervision

Python 132 7 Updated Apr 11, 2025

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,150 1,695 Updated Dec 20, 2025

Acoustic impulse response generation using diffusion models

Jupyter Notebook 74 2 Updated Oct 3, 2023

Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)

Python 26 Updated Nov 9, 2025

[CVPR2025] Official code for Lost in Translation Found in Context

Python 21 Updated Jun 13, 2025

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Python 5,911 1,070 Updated Jun 19, 2024

Real-Time Face Recognition use SCRFD, ArcFace, ByteTrack and Similarity Measure

Python 185 52 Updated Oct 24, 2024

Audio Large Language Models

Python 828 42 Updated Jul 5, 2025

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 185 4 Updated Dec 13, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 12,024 1,271 Updated Oct 11, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,938 127 Updated Dec 18, 2025

Official code for the paper "Scaling Multilingual Visual Speech Recognition"

Python 15 1 Updated Aug 15, 2025

Official code for the paper "Understanding Co-speech Gestures in-the-wild"

Python 19 1 Updated Oct 31, 2025

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 197 14 Updated Jul 29, 2025
Python 44 4 Updated Feb 5, 2025

Code for ACCV 2024 paper: "3D-Aware Instance Segmentation and Tracking in Egocentric Videos"

Python 12 Updated Jan 28, 2025

👩‍💻👨‍💻 AI 엔지니어 기술 면접 스터디 (⭐️ 2k+)

2,226 502 Updated Aug 5, 2025

Multimodal language model benchmark, featuring challenging examples

Python 181 11 Updated Dec 18, 2024

Code for the paper "The Sound of Water: Inferring Physical Properties from Pouring Liquids".

Jupyter Notebook 12 Updated Jan 13, 2025
Next