-
The University of Hong Kong
- https://xywu.me
Highlights
- Pro
Stars
Official Implementation of "MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives"
[NeurIPS 2025] How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
LongLive: Real-time Interactive Long Video Generation
Pointcept: Perceive the world with sparse points, a codebase for point cloud perception research. Latest works: Concerto (NeurIPS'25), Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral)
Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset
[ICCV 2025 Highlight] No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views
Reference PyTorch implementation and models for DINOv3
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Make your wildest 3D ConvNet dream architectures come true
PyTorch code and models for the DINOv2 self-supervised learning method.
[TCSVT‘24] SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[CVPR'25 Highlight] Official repository of Sonata: Self-Supervised Learning of Reliable Point Representations
Everything you need to build a stellar documentation website. Fast, accessible, and easy to use.
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
(NeurIPS 2024) LiT: Unifying LiDAR "Languages" with LiDAR Translator
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
This is the official release for the paper "EFM3D A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models" (https//arxiv.org/abs/2406.10224).
GIF encoder based on libimagequant (pngquant). Squeezes maximum possible quality from the awful GIF format.
This is the official code for the paper Tailor3D