Skip to content
View xichenpan's full-sized avatar

Highlights

  • Pro

Organizations

@SJTU-CSE @opencsapp

Block or report xichenpan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR 2026 🔥 ] Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"

Python 149 5 Updated Jan 26, 2026

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,894 83 Updated Feb 25, 2026
Python 8 Updated Jan 13, 2026

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

22,777 2,360 Updated Dec 12, 2025

Official Implementation of Paper Transfer between Modalities with MetaQueries

Python 319 14 Updated Oct 12, 2025

[CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Python 134 5 Updated May 16, 2025

Official implementation of BLIP3o-Series

Python 1,653 78 Updated Nov 29, 2025

Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)

Jupyter Notebook 56 3 Updated May 8, 2025

Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]

Python 24 1 Updated Aug 13, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,998 137 Updated Nov 7, 2025

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 4,320 362 Updated Dec 4, 2025

Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.

TypeScript 1,551 62 Updated May 11, 2026

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 4,143 589 Updated May 20, 2026

Fast and memory-efficient exact attention

Python 23,846 2,747 Updated May 19, 2026

[ICLR 2025 Oral] On Scaling Up 3D Gaussian Splatting Training

Python 667 43 Updated Sep 24, 2025

Unofficial implementation of "SODA: Bottleneck Diffusion Models for Representation Learning"

Jupyter Notebook 97 4 Updated Mar 21, 2024

Your image is almost there!

Python 7,621 438 Updated Jul 26, 2024

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Python 704 43 Updated Jan 7, 2024

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 579 29 Updated Jan 4, 2025

PDM-based Purifier

Python 23 Updated Nov 5, 2024

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Python 75 4 Updated May 25, 2024

A conda-forge distribution.

Shell 9,788 502 Updated May 14, 2026

Infinite Photorealistic Worlds using Procedural Generation

Python 6,966 588 Updated May 19, 2026

Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"

Jupyter Notebook 965 99 Updated May 3, 2026

Official implementation of AnimateDiff.

Python 12,119 1,075 Updated Jul 31, 2024

Grok open release

Python 51,671 8,482 Updated Aug 30, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 4,107 594 Updated Apr 24, 2024

TripoSR: Fast 3D Object Reconstruction from a Single Image

Python 6,509 823 Updated Aug 16, 2024

DUSt3R: Geometric 3D Vision Made Easy

Python 7,146 753 Updated Sep 24, 2025

Efficient, check-pointed data loading for deep learning with massive data sets.

Python 211 17 Updated Jun 12, 2023
Next