[ICML 2026] The official implementation of paper "Unified Multimodal Autoregressive Modeling with Shared Context—Visual Tokenizer is Key to Unification"

Python 33 Updated Jun 23, 2026

bytedance / Bernini

Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.

Python 933 74 Updated Jun 22, 2026

jd-opensource / JoyAI-VL-Interaction

Python 473 25 Updated Jun 22, 2026

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 6,705 969 Updated Jun 23, 2026

zlab-princeton / i1

Code release for "i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models"

Python 163 10 Updated Jun 11, 2026

ByteDance-Seed / Cola-DLM

The codebase of Cola DLM

Python 237 13 Updated Jun 11, 2026

AliothChen / CineDance

34 Updated Jun 11, 2026

Tencent-Hunyuan / HY-WU

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

Python 294 13 Updated Mar 18, 2026

tsinghua-fib-lab / AutoSOTA

Jupyter Notebook 548 42 Updated Jun 10, 2026

microsoft / DeepVideoDiscovery

**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.

Python 399 17 Updated Nov 3, 2025

marinero4972 / Awesome-HumanView-VideoUnderstanding

[survey] Watch, Remember, Reason: Human-View Video Understanding with MLLMs

24 9 Updated Jun 13, 2026

Tangent0308 / OMTG

Official open-source code for the paper "Towards One-to-Many Temporal Grounding".

Python 2 Updated Jun 20, 2026

openvla / openvla

Forked from TRI-ML/prismatic-vlms

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 6,469 765 Updated Mar 23, 2025

Xiangtai Li lxtGH

Highlights

Lists (3)

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars