Skip to content
View kobeshegu's full-sized avatar
🍉
Focusing
🍉
Focusing

Block or report kobeshegu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

ARM: An AutoRegressive Large Multimodal Model with Discrete Representations

39 Updated Jun 10, 2026

UniRL is a Framework for Unified Multimodal Model Reinforcement Learning

Python 525 28 Updated Jun 11, 2026

Ideogram 4: Open image model at the forefront of design

Python 1,950 189 Updated Jun 4, 2026

Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.

Python 709 53 Updated Jun 11, 2026

Are attention sinks necessary in diffusion transformers? Code for dynamic sink detection and causal suppression experiments in SD3/SDXL.

Python 5 Updated May 12, 2026

Our inference and training framework to run on the Cosmos Models

Python 224 29 Updated Jun 11, 2026

ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is built on a single-stream Diffusion Transformer (DiT), with only 8B DiT parameters, it reaches…

Python 476 34 Updated Apr 17, 2026

[CVPR 2026 Oral] VGGT Omega

Python 2,921 117 Updated May 18, 2026

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Python 719 36 Updated Jun 3, 2026

Lens is a 3.8B-parameter text-to-image diffusion model that achieves quality competitive with and in several cases surpassing models like FLUX and SD3, while requiring significantly less training c…

Python 237 16 Updated May 25, 2026

[CVPR 2026] Official repository for "Reviving ConvNeXt for Efficient Convolutional Diffusion Models"

Python 69 3 Updated Mar 26, 2026

HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency

Python 1,521 139 Updated Jun 10, 2026

A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.

Python 1,186 79 Updated Jun 3, 2026

A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models

Python 576 9 Updated Jun 10, 2026
Python 64 1 Updated May 14, 2026

Official Implemenation for RAEv2: Improved Baselines with Representation Autoencoders

Python 258 10 Updated May 21, 2026

awesome-native-multimodal-models

Python 13 Updated Mar 6, 2026
Python 124 2 Updated Mar 24, 2026

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

JavaScript 213,599 32,833 Updated Jun 11, 2026

An agentic skills framework & software development methodology that works.

Shell 224,968 19,998 Updated Jun 11, 2026

Prompt as Code | GPT-Image2 工业级提示词引擎与模板库,470+ 个案例逆向工程,20+ 套工业级模板,并提炼出Skills,持续更新中

JavaScript 7,223 960 Updated Jun 10, 2026

GPT-Image-2 PPT Generator Skill for Creating Image-Based PowerPoint Presentations in Codex and Other Skill-Compatible Agents

Python 1,465 83 Updated Jun 7, 2026

RL training framework for diffusion and omni-modality models

Python 347 50 Updated Jun 9, 2026

Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation

Python 709 28 Updated Jun 9, 2026

Official Codebase for "DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos" (ICML 2026)

Python 929 59 Updated Mar 21, 2026

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Python 2,207 185 Updated May 27, 2026

This repository is the collection of World model Papers

245 4 Updated Apr 8, 2026

A unified multimodal model toolkit

Python 124 8 Updated May 18, 2026
Next