JaesungHuh

🎹

jaesunghuh JaesungHuh

🎹

RØDE microphones, Prev @ VGG group

57 followers · 39 following

Achievements

Stars

BUTSpeechFIT / mt-asr-data-prep

Shell 21 1 Updated Feb 26, 2026

SuperKogito / SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 410 49 Updated Sep 30, 2024

bpiyush / LiFT

Code for LiFT (Linearized Feature Trajectories) video embedding

Python 23 Updated Dec 4, 2025

umbertocappellazzo / Omni-AVSR

Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 2026].

Python 34 3 Updated Mar 10, 2026

MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi

Python 1,790 288 Updated Mar 31, 2026

NVIDIA / audio-intelligence

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 114 10 Updated Mar 3, 2026

fastrepl / char

AI notepad for meetings

Rust 8,199 580 Updated Apr 13, 2026

character-ai / Ovi

Python 1,687 193 Updated Nov 15, 2025

ahaliassos / raven

Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)

Python 80 9 Updated Feb 27, 2025

MarshallT-99 / VALLR

Python 25 3 Updated Oct 1, 2025

facebookresearch / seamless_interaction

Foundation Models and Data for Human-Human and Human-AI interactions.

Python 369 29 Updated Dec 13, 2025

genmoai / mochi

The best OSS video generation models, created by Genmo

Python 3,634 478 Updated Nov 14, 2025

earthspecies / aves

AVES: Animal Vocalization Encoder based on Self-Supervision

Python 141 7 Updated Feb 4, 2026

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,548 1,753 Updated Apr 11, 2026

egrinstein / roomfuser

Acoustic impulse response generation using diffusion models

Jupyter Notebook 76 2 Updated Oct 3, 2023

ekazakos / grove

Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)

Python 30 1 Updated Jan 18, 2026

art-jang / LiTFiC

[CVPR2025] Official code for Lost in Translation Found in Context

Python 23 Updated Jan 14, 2026

FoundationVision / ByteTrack

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Python 6,235 1,107 Updated Jun 19, 2024

vectornguyen76 / face-recognition

Real-Time Face Recognition use SCRFD, ArcFace, ByteTrack and Similarity Measure

Python 193 54 Updated Oct 24, 2024

AudioLLMs / Awesome-Audio-LLM

Audio Large Language Models

Python 902 47 Updated Jul 5, 2025

ddlBoJack / MMAR

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 202 4 Updated Feb 25, 2026

facebookresearch / vggt

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 12,833 1,417 Updated Mar 3, 2026

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,245 153 Updated Mar 12, 2026

Sindhu-Hegde / multivsr

Official code for the paper "Scaling Multilingual Visual Speech Recognition"

Python 20 1 Updated Aug 15, 2025

Sindhu-Hegde / jegal

Official code for the paper "Understanding Co-speech Gestures in-the-wild"

Python 23 Updated Oct 31, 2025

roudimit / whisper-flamingo

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 205 16 Updated Jul 29, 2025

karazijal / lrtl

Python 44 4 Updated Feb 5, 2025

yashbhalgat / egoseg3d

Code for ACCV 2024 paper: "3D-Aware Instance Segmentation and Tracking in Egocentric Videos"

Python 12 Updated Jan 28, 2025

boost-devs / ai-tech-interview

👩‍💻👨‍💻 AI 엔지니어 기술 면접 스터디 (⭐️ 2k+)

2,302 510 Updated Mar 17, 2026

reka-ai / reka-vibe-eval

Multimodal language model benchmark, featuring challenging examples

Python 187 11 Updated Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jaesunghuh JaesungHuh

Achievements

Achievements

Block or report JaesungHuh

Stars

BUTSpeechFIT / mt-asr-data-prep

SuperKogito / SER-datasets

bpiyush / LiFT

umbertocappellazzo / Omni-AVSR

MontrealCorpusTools / Montreal-Forced-Aligner

NVIDIA / audio-intelligence

fastrepl / char

character-ai / Ovi

ahaliassos / raven

MarshallT-99 / VALLR

facebookresearch / seamless_interaction

genmoai / mochi

earthspecies / aves

triton-inference-server / server

egrinstein / roomfuser

ekazakos / grove

art-jang / LiTFiC

FoundationVision / ByteTrack

vectornguyen76 / face-recognition

AudioLLMs / Awesome-Audio-LLM

ddlBoJack / MMAR

facebookresearch / vggt

facebookresearch / perception_models

Sindhu-Hegde / multivsr

Sindhu-Hegde / jegal

roudimit / whisper-flamingo

karazijal / lrtl

yashbhalgat / egoseg3d

boost-devs / ai-tech-interview

reka-ai / reka-vibe-eval