-
Shanghai Jiao Tong University
- Shanghai
- www.wzk.plus
- https://scholar.google.com/citations?user=W0zVf-oAAAAJ
Starred repositories
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
Official Repo for paper "Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning"
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
注意建立这个repo只是因为网页自身就是全部源码,原作者并未声明license所以本repo也不包含license,一切行为请自行斟酌,不要给原作者添麻烦。 原作者:B站@蛆肉儿串儿
CORAL is a robust, lightweight infrastructure for multi-agent autonomous self-evolution, built for autoresearch.
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
ScaleEdit-12M is the largest open-source image editing dataset to date, spanning 23 task families across diverse real and synthetic domains.
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
Bridging the gap between image generation and real-world design: a benchmark for structured, multi-constraint commercial visual content generation.
Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI…
GRADE: Grounded Reasoning Assessment for Discipline-informed Editing
We provide TextEdit, a high-quality, multi-scenario text editing benchmark for generation models.
The first unified, efficient, and extensible evaluation toolkit for evaluating image generation and editing models across multiple benchmarks.
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
This is a repository for awesome any2any work collection.
Memento-Skills: Let Agents Design Agents
RISE-Video: Can Video Generators Decode Implicit World Rules?
PaperBanana: Automating Academic Illustration For AI Scientists
Build, evaluate, and integrate long-term memory for self-evolving agents.
🏆 Add dynamically generated GitHub Stat Trophies on your readme
ViLoMem: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs