Lists (1)
Sort Name ascending (A-Z)
Stars
[ICML 2026] ZwZ model family: SOTA fine-grained perception performace; ZoomBench: a new challenging perception benchmark
PhotoFlow: Agentic 3D Virtual Photography Missions
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation
Manga/Comics Translation Helper Tool
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Bridging the gap between image generation and real-world design: a benchmark for structured, multi-constraint commercial visual content generation.
Code repo for "EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation"
The official repo of CrossEarth-SAR, a sar-centric and billion-scale geospatial foundation model for cross-domain semantic segmentation
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
GRADE: Grounded Reasoning Assessment for Discipline-informed Editing
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
[ICML 2026 Oral] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
The official repository of the first version of ACE-Brain foundation model.
CRS-自建Claude Code镜像,一站式开源中转服务,让 Claude、OpenAI、Gemini、Droid 订阅统一接入,支持拼车共享,更高效分摊成本,原生工具无缝使用。
让每一次引用都成为可解释的影响力 Turning Every Citation into Explainable Impact
FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…
A brand new web Visual Novel engine | 全新的网页端视觉小说引擎
PaperBanana: Automating Academic Illustration For AI Scientists