Stars
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
official code for "3D Question Answering via only 2D Vision-Language Models"
Supercharge Your LLM with the Fastest KV Cache Layer
[ICCV 2025 Oral] SceneSplat - Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Code for 3D-LLM: Injecting the 3D World into Large Language Models