Lists (7)
Sort Name ascending (A-Z)
Starred repositories
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Instant voice cloning by MIT and MyShell. Audio foundation model.
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
End-to-End Object Detection with Transformers
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Image augmentation for machine learning experiments.
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
Generate 3D objects conditioned on text or images
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
Pytorch implementation of convolutional neural network visualization techniques
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models
Minimal PyTorch implementation of YOLOv3
A Collection of Variational Autoencoders (VAE) in PyTorch.
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO