Lists (32)
Sort Name ascending (A-Z)
audio
awsome
ControlNet
diffusion
diffution
ios
LLMs
matting
mediapipe
motion prediction
SMPL-Related
virtual try on
人体3D效果相关
人体补全
人体重建
分割
图像恢复
场景重建
多模态
姿态估计
安卓
教程
数据标注
数据集
检测
生成网络
移动端开发部署
视频编解码
跟踪
重点关注
颜色
高效神经网络
Stars
Machine Learning Engineering Open Book
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM …
Use DINOv3’s powerful, self-supervised visual features + YOLOv12’s blazing-fast detection, all in one repo. Whether you have only a few hundred labeled images or a medium-sized dataset, DINOV3-YOLO…
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling serv…
This package contains the original 2012 AlexNet code.
A minimal programming example for a chat server
Turn any computer or edge device into a command center for your computer vision projects.
Collection of tutorials on diffusion models, step-by-step implementation guide, scripts for generating images with AI, prompt engineering guide, and resources for further learning.
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
A PyTorch library and evaluation platform for end-to-end compression research
Contains the "pycocotools" package on PyPI. Changes made to the official cocoapi about packaging.
[ICML2025] An 8-step inversion and 8-step editing process works effectively with the FLUX-dev model. (3x speedup with results that are comparable or even superior to baseline methods)
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
A detailed diagram laying out the full Flux.1 [dev] architecture as shared by Black Forest Labs at https://github.com/black-forest-labs/flux.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
An application example using the same C++ code on both an Android project and an iPhone project.
Deep Learning & Applied AI: Tutorials
I'm compiling comprehensive coding tutorials for many different languages and frameworks! 🐲