-
Shanghai Jiao Tong University
- Shanghai
- www.wzk.plus
- https://scholar.google.com/citations?user=W0zVf-oAAAAJ
Highlights
- Pro
Starred repositories
[CVPR2020] A Dataset for SPAtial REasoning on Three-View Line Drawings
[ArXiv 2025] Co-Training Vision Language Models for Remote Sensing Multi-task Learning
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
A review for remote sensing vision language models
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]
ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding(书生 · 妙析多模态美学理解大模型)
Processed / Cleaned Data for Paper Copilot
Echos is a headless, API-driven DAW engine. It’s the backend for building AI tools that automate the entire music production lifecycle.
Native Multimodal Models are World Learners
Best Papers of Top Venues like CVPR, NeurIPS, ICLR, ICML, ICCV, ECCV, ...
A visual interpretation tool for Deformable DETR
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Full-stack AI booking platform with RAG retrieval, Multi-Agent collaboration & smart pricing engine
Official Repository of paper MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
[AAAI 2026] Open-Source LLM-Based Data Analysis Agents
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group