Skip to content
View JieyuZ2's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@RAIVNLab

Block or report JieyuZ2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Allen Institute for AI: WildDet3D: Scaling Promptable 3D Detection in the Wild

Python 350 25 Updated Apr 18, 2026

Inference repo for Falcon-Perception and Falcon-OCR model, early-fusion, natively multimodal, dense Autoregressive Transformer models.

Python 525 45 Updated Apr 14, 2026

This is the repository for VFig: Vectorizing Complex Figures with Vision-Language Models

Python 12 Updated Mar 30, 2026

Code for the Molmo2 Vision-Language Model

Python 501 35 Updated Mar 18, 2026

model for 3d bounding box detection projects based on 3D MooD

Python 3 Updated Mar 30, 2026

THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to build, update, and exploit globally consistent spatial beliefs.

Python 72 8 Updated Feb 27, 2026

Sparking "Thinking with Videos" via Reinforcement Learning

Python 156 6 Updated Oct 30, 2025

A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.

Jupyter Notebook 2,706 272 Updated May 6, 2025

codebase for iccv 2025 paper "One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory"

Python 84 3 Updated Aug 13, 2025

Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos

Python 16 1 Updated Mar 16, 2026

All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.

Python 1,412 72 Updated Apr 17, 2026
Python 7 1 Updated Aug 2, 2025

Official implementation of BLIP3o-Series

Python 1,648 78 Updated Nov 29, 2025
Python 226 12 Updated Jun 2, 2025

Fast, Flexible and Portable Structured Generation

C++ 1,634 141 Updated Apr 17, 2026

Official Implementation for the paper "Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base"

Jupyter Notebook 27 1 Updated Sep 2, 2025

[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.

Python 129 10 Updated Aug 22, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,817 210 Updated Apr 10, 2026

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python 194 8 Updated Mar 29, 2025
Python 68 3 Updated Sep 15, 2025

DataComp for Language Models

HTML 1,435 131 Updated Sep 9, 2025

[ICLR 2026] Scene Graph Driven Data Synthesis for Visual Generation Training

Python 83 3 Updated Feb 1, 2026

A instruction data generation system for multimodal language models.

Jupyter Notebook 37 1 Updated Jan 31, 2025

(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.

Python 31 2 Updated Aug 7, 2025

AG2 (formerly AutoGen): The Open-Source AgentOS.Join us at: https://discord.gg/sNGSwQME3x

Python 4,417 591 Updated Apr 17, 2026

A programming framework for agentic AI. Discord: https://discord.gg/pAbnFJrkgZ

Jupyter Notebook 139 26 Updated Feb 5, 2025

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,200 1,105 Updated Nov 18, 2024

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.

Python 62,083 5,396 Updated Apr 18, 2026

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Python 6 Updated Jan 24, 2025

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Python 46 3 Updated Aug 1, 2024
Next