Stars
Stable Diffusion web UI
Robust Speech Recognition via Large-Scale Weak Supervision
Models and examples built with TensorFlow
real time face swap and one-click video deepfake with only a single image
A high-throughput and memory-efficient inference and serving engine for LLMs
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Clone a voice in 5 seconds to generate arbitrary speech in real-time
CLI platform to experiment with codegen. Precursor to: https://lovable.dev
The simplest, fastest repository for training/finetuning medium-sized GPTs.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
You like pytorch? You like micrograd? You love tinygrad! ❤️
Deezer source separation library including pretrained models.
A generative world for general-purpose robotics & embodied AI learning.
Industry leading face manipulation platform
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Faster Whisper transcription with CTranslate2
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
State-of-the-Art Text Embeddings
Stable Diffusion with Core ML on Apple Silicon
SQL databases in Python, designed for simplicity, compatibility, and robustness.
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
End-to-End Object Detection with Transformers
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Wan: Open and Advanced Large-Scale Video Generative Models
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
An open source implementation of CLIP.