- Shatin, N.T., HKSAR
- https://lixin4ever.github.io/
- @lixin4ever
Stars
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Code for [AAAI 2026] AffordDex: Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors
A framework aiming to bridge fast robot prototyping, predefined motion primitives, heterogeneous teleoperation, data collection, and flexible deployment across diverse robot platforms.
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
Native Multimodal Models are World Learners
MiniMax-M2, a model built for Max coding & agentic workflows.
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
Code for "High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting"
VideoNSA: Native Sparse Attention Scales Video Understanding
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Fully Open Framework for Democratized Multimodal Training
MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.
MiroMind Research Agent: Fully Open-Source Deep Research Agent with Reproducible State-of-the-Art Performance on FutureX, GAIA, HLE, BrowserComp and xBench.
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations