Skip to content
View lixin4ever's full-sized avatar
🍉
I may be slow to respond before the due date of ACL.
🍉
I may be slow to respond before the due date of ACL.

Organizations

@dmlc @textmine

Block or report lixin4ever

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 1,943 121 Updated Dec 25, 2025
17 Updated Dec 17, 2025

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Python 40 Updated Dec 23, 2025

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.

Python 1,485 242 Updated Jul 31, 2024
Python 627 60 Updated Dec 25, 2025

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

19,148 1,998 Updated Dec 12, 2025

[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Python 67 Updated Dec 23, 2025

Code for [AAAI 2026] AffordDex: Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors

Python 12 Updated Nov 20, 2025

A framework aiming to bridge fast robot prototyping, predefined motion primitives, heterogeneous teleoperation, data collection, and flexible deployment across diverse robot platforms.

C++ 15 Updated Dec 21, 2025

SAM 3D Objects

Python 5,101 479 Updated Dec 16, 2025

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 6,467 754 Updated Dec 21, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 447 25 Updated Dec 15, 2025

Native Multimodal Models are World Learners

Python 1,372 52 Updated Nov 28, 2025
Python 100 11 Updated Oct 27, 2025

MiniMax-M2, a model built for Max coding & agentic workflows.

2,125 161 Updated Nov 13, 2025

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 681 27 Updated Dec 23, 2025

[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide

10,215 695 Updated Dec 3, 2025

Contexts Optical Compression

Python 21,573 1,929 Updated Oct 25, 2025

Code for "High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting"

Python 45 Updated Oct 27, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding

Python 75 1 Updated Nov 16, 2025

Official code of RDT 2

Python 606 30 Updated Dec 3, 2025
Python 486 28 Updated Nov 29, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,164 193 Updated Oct 9, 2025
Python 95 5 Updated Sep 19, 2024

Fully Open Framework for Democratized Multimodal Training

Python 663 53 Updated Dec 15, 2025

MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.

Python 1,369 95 Updated Dec 23, 2025

MiroMind Research Agent: Fully Open-Source Deep Research Agent with Reproducible State-of-the-Art Performance on FutureX, GAIA, HLE, BrowserComp and xBench.

Python 1,623 175 Updated Nov 30, 2025

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Python 2,577 161 Updated Dec 18, 2025

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Python 1,147 63 Updated Oct 13, 2025
Next