Lists (31)
Sort Name ascending (A-Z)
3D
Agent
AGI
Artificial General IntelligenceAIGC
Architecture
ChatGPT
CLIP
Data
DiffusionModel
Docs
Face
Face related tasksGeneral Tasks
Generative AI
Human
Image Editing
LLM+LMM
Low Level Vision
MultiModality
NeuralRender
NeuralStyle
NLP
Others
Paper Collection
RemoteSense
RL
Robots
StyleGAN
stylegan related worksTools
Video
Visual Quality Assessment
VQGAN
Stars
Let Skills Evolve Collectively with Agentic Evolver
Ideogram 4: Open image model at the forefront of design
Official code for the paper "Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models"
Official repo for "Let ViT Speak: Generative Language-Image Pre-training"
Code release for "i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models"
Repo for Qwen Image Finetune
[CVPR 2026 DataCV Workshop] 4KLSDB: A Large-Scale Native-4K Dataset and Benchmark for Image Restoration and Generation.
A Foundation Model for Generalist Gaming Agents
[CVPR 2026 Oral] Official implementation for ChordEdit: One-Step Low-Energy Transport for Image Editing
Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".
Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.
Tiny AutoEncoder for Hunyuan Video (and other video models)
GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration
A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models
A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.
AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …
🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型!A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!
[ICML26 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images
[ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation" & Causal Forcing++
[AAAI 2026] Official code release of our paper "Fine-grained Image Quality Assessment for Perceptual Image Restoration". 首个面向图像恢复的细粒度IQA数据集FGRestore+方法FGResQ
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
Taste-Skill - gives your AI good taste. stops the AI from generating boring, generic slop