-
Nanyang Technological University
- Singapore
- https://liuziwei7.github.io/
- @liuziwei7
Stars
Code of "Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion"
SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects
🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
🎥 [Awesome] Egocentric / First-Person Video Datasets 📚 Papers, Benchmarks & Resources for Ego Vision
[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
[CVPR 2026 Highlight] U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
[Roadmap] Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Modular SenseNova skills for building AI-powered office assistants and productivity workflows
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
Your behavior is the signal. Not your words. — Behavioral intelligence for AI agents, built into your MacBook notch.
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
[ICLR 2026] 🦅 FALCON: an effective vision-language-action model injects rich 3D spatial tokens into the action head, enabling robust spatial understanding and SOTA performance across diverse manipu…
A simple video streaming baseline that outperforms SOTAs.
A benchmark for evaluating contextual agents on realistic multimodal personal-computer environments with profiling and factual-retention tasks.
Implementation for Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer.
The official implementation of “MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction”
Official Implementation of "Kinema4D: Kinematic4D World Modeling for Spatiotemporal Embodied Simulation"
An inference-time, plug-and-play method for temporal control in multi-event generation
Toy-scale unified multimodal model experiments — encoder-free understanding & generation with Mixture-of-Transformers on MLX/Apple Silicon
[ICML 2026 Oral] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
[ArXiv 26] The official repository of "ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors".
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition