-
RØDE microphones
- https://www.robots.ox.ac.uk/~jaesung/
- @huh_jaesung
Stars
A collection of (mostly) technical things every software developer should know about
Robust Speech Recognition via Large-Scale Weak Supervision
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A markup-based typesetting system that is powerful and easy to learn.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A playbook for systematically maximizing the performance of deep learning models.
One second to read GitHub code with VS Code.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Development repository for the Triton language and compiler
A multi-voice TTS system trained with an emphasis on quality
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
PyTorch package for the discrete VAE used for DALL·E.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
ImageBind One Embedding Space to Bind Them All
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
🔥 2D and 3D Face alignment library build using pytorch
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Google AI 2018 BERT pytorch implementation
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
The official PyTorch implementation of Google's Gemma models
CoTracker is a model for tracking any point (pixel) on a video.
An open-source framework for training large multimodal models.