-
RØDE microphones
- https://www.robots.ox.ac.uk/~jaesung/
- @huh_jaesung
Stars
Robust Speech Recognition via Large-Scale Weak Supervision
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
PyTorch package for the discrete VAE used for DALL·E.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
ImageBind One Embedding Space to Bind Them All
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
🔥 2D and 3D Face alignment library build using pytorch
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Google AI 2018 BERT pytorch implementation
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
The official PyTorch implementation of Google's Gemma models
An open-source framework for training large multimodal models.
The best OSS video generation models, created by Genmo
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
Command line utility for forced alignment using Kaldi
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
Out of time: automated lip sync in the wild