Highlights
- Pro
Stars
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…
Pytorch implementation for MeanFlow
[CVPR 2025] h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform
A simple UI for SAM 2. Give an input path to a directory of video frames, and the script will let you look through the frames, plot points, then feed the points into SAM 2.
clintonjwang / ControlNet
Forked from lllyasviel/ControlNetGenerate videos that interpolate between two given images
A summary of related works about flow matching, stochastic interpolants
[ISBI 2024] An implementation of SAM3D which adapts Segment Anything Model for Volumetric Medical Image Segmentation
[WACV 2024] Decoding Radiologists’ Intense Focus for Accurate CXR Diagnoses: A Controllable & Interpretable AI System
[Remote Sensing] AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation
[ICRA 2024 Oral] Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Refine high-quality datasets and visual AI models
[CVPR 2023] This is the official PyTorch implementation for "Dynamic Focus-aware Positional Queries for Semantic Segmentation".
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
CEDNet: A Cascade Encoder-Decoder Network for Dense Prediction (Pattern Recognition 2024)
AIOZ-GDANCE: a large-scale dataset & baseline for music-driven group dance generation. (CVPR 2023)
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
An example of the DINO detector using C++ and the Libtorch library
[Asilomar 2022] Contextual Explainable Video Representation: Human Perception-based Understanding
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Qualia is a deep learning framework deeply integrated with automatic differentiation and dynamic graphing with CUDA acceleration. Qualia was built from scratch.