Stars
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
[ICML 2026] Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning
📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程
Using FLUX.1 Kontext for creating segmentation masks for objects absent from images, enabling workflows in inpainting and virtual try-ons.
The ultimate training toolkit for finetuning diffusion models
Reference PyTorch implementation and models for DINOv3
Diffusion Models in Medical Imaging (Published in Medical Image Analysis Journal)
Clean, scalable and easy to use ResNet implementation in Pytorch
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Using Low-rank adaptation to quickly fine-tune diffusion models.
official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"
vfph55 / mindall-e
Forked from kakaobrain/mindall-ePyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs
vfph55 / glide-text2im
Forked from openai/glide-text2imGLIDE: a diffusion-based text-conditional image synthesis model
Stable Diffusion implemented from scratch in PyTorch
vfph55 / pytorch-paligemma
Forked from hkproj/pytorch-paligemmaCoding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw