Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Simulation of spiking neural networks (SNNs) using PyTorch.
Cross-platform, customizable ML solutions for live and streaming media.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
An official implementation for "OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera"
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Event-based Vision Resources. Community effort to collect knowledge on event-based vision technology (papers, workshops, datasets, code, videos, etc)
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
Reference PyTorch implementation and models for DINOv3
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version in translation
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
The code for PixelRefer & VideoRefer
A deep learning library for video understanding research.
The simplest, fastest repository for training/finetuning small-sized VLMs.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A curated list of awesome papers on dataset distillation and related applications.