Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Robust Speech Recognition via Large-Scale Weak Supervision
🏡 Open source home automation that puts local control and privacy first.
Models and examples built with TensorFlow
scikit-learn: machine learning in Python
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
The world's simplest facial recognition api for Python and the command line
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
StyleGAN2 - Official TensorFlow Implementation
[CVPR 2024] Official repository for "MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model"
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Stable Diffusion built-in to Blender
BoxMOT: Pluggable SOTA multi-object tracking modules with support for axis-aligned and oriented bounding boxes
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
LightGlue: Local Feature Matching at Light Speed (ICCV 2023)