-
Johns Hopkins University <- Tsinghua
- Baltimore, United States
-
23:53
(UTC -05:00) - https://caiyuanhao1998.github.io/
Stars
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction (ICCV 2025)
official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
[CVPR'25] A vision question answering (VQA) benchmark for 6D spatial reasoning.
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Ba…
Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"
[ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Enjoy the magic of Diffusion models!
Wan: Open and Advanced Large-Scale Video Generative Models
[NeurIPS 2025] LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
[ICCV 2025] Official implementation of X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
[NeurIPS 2025] Completeness-Aware Reconstruction Enhancement
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions (NeurIPS 2025)
[3DV 2026] VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment
Official Implementation of X-Filed. Code coming soon.
A toolbox for feedforward sparse-view CT reconstruction
SAH-SCI: Self-Supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging
A curated list of recent diffusion models for video generation, editing, and various other applications.
Official implementation of “LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images”
A curated list of instruction-prompted visual translation papers
[ECCV22] Unbiased Multi-Modality Guidance for Image Inpainting