-
University of Washington
- Seattle,WA
- https://jieyuz2.github.io/
Highlights
- Pro
Starred repositories
Allen Institute for AI: WildDet3D: Scaling Promptable 3D Detection in the Wild
Inference repo for Falcon-Perception and Falcon-OCR model, early-fusion, natively multimodal, dense Autoregressive Transformer models.
This is the repository for VFig: Vectorizing Complex Figures with Vision-Language Models
model for 3d bounding box detection projects based on 3D MooD
THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to build, update, and exploit globally consistent spatial beliefs.
Sparking "Thinking with Videos" via Reinforcement Learning
A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
codebase for iccv 2025 paper "One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory"
Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
Fast, Flexible and Portable Structured Generation
Official Implementation for the paper "Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base"
[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
[ICLR 2026] Scene Graph Driven Data Synthesis for Visual Generation Training
A instruction data generation system for multimodal language models.
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
AG2 (formerly AutoGen): The Open-Source AgentOS.Join us at: https://discord.gg/sNGSwQME3x
autogenhub / autogen
Forked from microsoft/autogenA programming framework for agentic AI. Discord: https://discord.gg/pAbnFJrkgZ
LAVIS - A One-stop Library for Language-Vision Intelligence
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion