Skip to content
View lixin4ever's full-sized avatar
🍉
I may be slow to respond before the due date of ACL.
🍉
I may be slow to respond before the due date of ACL.

Organizations

@dmlc @textmine

Block or report lixin4ever

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
16 Updated Dec 17, 2025

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Python 36 Updated Dec 16, 2025

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.

Python 1,481 241 Updated Jul 31, 2024
Python 611 56 Updated Dec 19, 2025

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

18,877 1,959 Updated Dec 12, 2025

[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Python 67 Updated Dec 18, 2025

Code for [AAAI 2026] AffordDex: Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors

Python 12 Updated Nov 20, 2025

A framework aiming to bridge fast robot prototyping, predefined motion primitives, heterogeneous teleoperation, data collection, and flexible deployment across diverse robot platforms.

C++ 15 Updated Dec 13, 2025

SAM 3D Objects

Python 4,983 459 Updated Dec 16, 2025

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 6,199 717 Updated Dec 11, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 443 24 Updated Dec 15, 2025

Native Multimodal Models are World Learners

Python 1,367 52 Updated Nov 28, 2025
Python 97 10 Updated Oct 27, 2025

MiniMax-M2, a model built for Max coding & agentic workflows.

2,043 156 Updated Nov 13, 2025

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 680 27 Updated Dec 17, 2025

[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide

10,046 679 Updated Dec 3, 2025

Contexts Optical Compression

Python 21,501 1,923 Updated Oct 25, 2025

Code for "High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting"

Python 45 Updated Oct 27, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding

Python 73 1 Updated Nov 16, 2025

Official code of RDT 2

Python 605 28 Updated Dec 3, 2025
Python 481 28 Updated Nov 29, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,137 191 Updated Oct 9, 2025
Python 94 5 Updated Sep 19, 2024

Fully Open Framework for Democratized Multimodal Training

Python 659 50 Updated Dec 15, 2025

MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.

Python 1,339 92 Updated Dec 19, 2025

MiroMind Research Agent: Fully Open-Source Deep Research Agent with Reproducible State-of-the-Art Performance on FutureX, GAIA, HLE, BrowserComp and xBench.

Python 1,567 172 Updated Nov 30, 2025

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Python 2,499 154 Updated Dec 18, 2025

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Python 1,126 62 Updated Oct 13, 2025

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Python 448 14 Updated Dec 15, 2025
Next