Starred repositories
An open-source, GPU-accelerated physics simulation engine built upon NVIDIA Warp, specifically targeting roboticists and simulation researchers.
[CVPR 2026] The Missing Point in Vision Transformers for Universal Image Segmentation
This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".
A lightweight, local-first, and free experiment tracking library from Hugging Face 🤗
Official implementation of RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
CLIP+MLP Aesthetic Score Predictor
A linear estimator on top of clip to predict the aesthetic quality of pictures
SigLIP-based Aesthetic Score Predictor
Code for Distributions as Actions: A Unified Framework for Diverse Action Spaces.
Official repository of In-Context LoRA for Diffusion Transformers
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows
Multi-Scene Camera Pose Regression with Transformers
FDFO: Finite Difference Flow Optimization
FASHN VTON v1.5: Efficient Maskless Virtual Try-On in Pixel Space
A curated list of papers and selected technical blogs on Loop Models.
Portable file server with accelerated resumable uploads, dedup, WebDAV, SFTP, FTP, TFTP, zeroconf, media indexer, thumbnails++ all in one file
[CVPR-2024] Official Pytorch implementation of "Misalignment-Robust Frequency Distribution Loss for Image Transformation"
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models
[ECCV 2026] Official code of "Representation Alignment for Just Image Transformers is not Easier than You Think"
DeepStream SDK Python bindings and sample applications
A feed-forward 3D foundation model for reconstructing scenes from streaming data
Efficient Universal Perception Encoder: a single on-device vision encoder with versatile representations that match or exceed specialized experts across multiple task domains.
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution 🧬