-
Tsinghua University
- Beijing
-
03:37
(UTC +08:00) - robertluo1.github.io
Lists (3)
Sort Name ascending (A-Z)
Stars
Towards Scalable Pre-training of Visual Tokenizers for Generation
WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向真正的"Vibe PPT"; 支持上传任意模板图片;上传任意素材&智能解析;一句话/大纲/页面描述自动生成PPT;口头修改指定区域、一键导出 - An AI-native PPT generator based on nano banana pro🍌
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational constraints.
ENACT is a benchmark that evaluates embodied cognition through world modeling from egocentric interaction. It is designed to be simple and have a scalable dataset.
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
[NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
A PyTorch Implementation of Image Style Transfer Using Convolutional Neural Networks
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
Native Multimodal Models are World Learners
🐻 Uniform Discrete Diffusion with Metric Path for Video Generation
[NeurIPS 2025] Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy