Skip to content
View JaesungHuh's full-sized avatar
🎹
🎹

Block or report JaesungHuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
109 stars written in Python
Clear filter

Robust Speech Recognition via Large-Scale Weak Supervision

Python 96,777 11,937 Updated Mar 27, 2026

Animation engine for explanatory math videos

Python 85,595 7,187 Updated Mar 26, 2026

Inference code for Llama models

Python 59,273 9,826 Updated Jan 26, 2025

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 20,951 2,204 Updated Mar 25, 2026

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 12,721 1,399 Updated Mar 3, 2026

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

Python 12,717 1,701 Updated Apr 7, 2025

A PyTorch-based Speech Toolkit

Python 11,382 1,675 Updated Mar 27, 2026

PyTorch package for the discrete VAE used for DALL·E.

Python 10,872 1,887 Updated Jan 31, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,476 1,738 Updated Mar 27, 2026

ImageBind One Embedding Space to Bind Them All

Python 9,003 845 Updated Nov 21, 2025

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,842 1,387 Updated Dec 6, 2023

🔥 2D and 3D Face alignment library build using pytorch

Python 7,508 1,382 Updated Aug 30, 2024

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Python 7,496 1,030 Updated Jul 3, 2024

Google AI 2018 BERT pytorch implementation

Python 6,520 1,326 Updated Sep 15, 2023

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Python 6,193 1,095 Updated Jun 19, 2024

The official PyTorch implementation of Google's Gemma models

Python 5,610 581 Updated May 30, 2025

Official DeiT repository

Python 4,329 589 Updated Mar 15, 2024

An open-source framework for training large multimodal models.

Python 4,083 317 Updated Aug 31, 2024

The best OSS video generation models, created by Genmo

Python 3,630 477 Updated Nov 14, 2025

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 3,030 233 Updated Feb 9, 2026

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,245 212 Updated Dec 27, 2025

Contrastive Language-Audio Pretraining

Python 2,082 206 Updated May 15, 2025

NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024

Python 1,827 76 Updated Nov 27, 2025

Command line utility for forced alignment using Kaldi

Python 1,777 287 Updated Feb 24, 2026
Python 1,673 189 Updated Nov 15, 2025

VGGSfM: Visual Geometry Grounded Deep Structure From Motion

Python 1,374 112 Updated Mar 11, 2025

Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

Python 1,308 379 Updated Dec 31, 2025

Audio Large Language Models

Python 896 45 Updated Jul 5, 2025

Out of time: automated lip sync in the wild

Python 879 192 Updated Jan 23, 2024
Next