kobeshegu

Follow

🍉

Focusing

Mengping Yang kobeshegu

🍉

Focusing

Follow

Ph.D of Computer Science and Technology, focus on multi-modal learning and generative models.

111 followers · 505 following

FDU & SAIS
Shanghai
10:52 (UTC +08:00)
kobeshegu.github.io
@kobeshegu
https://www.zhihu.com/people/ke-ke-ke-ke-ke-da-xia

Achievements

Achievements

Lists (8)

Sort

Awesome ARs

10 repositories

Awesome-Diffusions

30 repositories

Awesome-LLMs

14 repositories

Awesome-MLLMs

Awesome Omni papers

14 repositories

Awesome RL/HF Inference-scaling

Awesome-Tokenizers

20 repositories

Awesome Video Gen

32 repositories

Stars

wdrink / ARM

ARM: An AutoRegressive Large Multimodal Model with Discrete Representations

39 Updated Jun 10, 2026

Tencent-Hunyuan / UniRL

UniRL is a Framework for Unified Multimodal Model Reinforcement Learning

Python 525 28 Updated Jun 11, 2026

ideogram-oss / ideogram4

Ideogram 4: Open image model at the forefront of design

Python 1,950 189 Updated Jun 4, 2026

bytedance / Bernini

Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.

Python 709 53 Updated Jun 11, 2026

wfz666 / ICML26-attention-sink

Are attention sinks necessary in diffusion transformers? Code for dynamic sink detection and causal suppression experiments in SD3/SDXL.

Python 5 Updated May 12, 2026

NVIDIA / cosmos-framework

Our inference and training framework to run on the Cosmos Models

Python 224 29 Updated Jun 11, 2026

HiDream-ai / HiDream-O1-Image

Python 501 31 Updated May 20, 2026

baidu / ERNIE-Image

ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is built on a single-stream Diffusion Transformer (DiT), with only 8B DiT parameters, it reaches…

Python 476 34 Updated Apr 17, 2026

facebookresearch / vggt-omega

[CVPR 2026 Oral] VGGT Omega

Python 2,921 117 Updated May 18, 2026

nv-tlabs / PiD

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Python 719 36 Updated Jun 3, 2026

microsoft / Lens

Lens is a 3.8B-parameter text-to-image diffusion model that achieves quality competitive with and in several cases surpassing models like FLUX and SD3, while requiring significantly less training c…

Python 237 16 Updated May 25, 2026

star-kwon / FCDM

[CVPR 2026] Official repository for "Reviving ConvNeXt for Efficient Convolutional Diffusion Models"

Python 69 3 Updated Mar 26, 2026

Tencent-Hunyuan / HY-WorldPlay

HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency

Python 1,521 139 Updated Jun 10, 2026

bytedance / Lance

A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.

Python 1,186 79 Updated Jun 3, 2026

shengshu-ai / minWM

A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models

Python 576 9 Updated Jun 10, 2026

alibaba / OmniDoc-TokenBench

Python 64 1 Updated May 14, 2026

nanovisionx / RAEv2

Official Implemenation for RAEv2: Improved Baselines with Representation Autoencoders

Python 258 10 Updated May 21, 2026

isLinXu / awesome-native-multimodal-models

awesome-native-multimodal-models

Python 13 Updated Mar 6, 2026

AlanBaade / LatentForcing

Python 124 2 Updated Mar 24, 2026

affaan-m / ECC

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

JavaScript 213,599 32,833 Updated Jun 11, 2026

obra / superpowers

An agentic skills framework & software development methodology that works.

Shell 224,968 19,998 Updated Jun 11, 2026

freestylefly / awesome-gpt-image-2

Prompt as Code | GPT-Image2 工业级提示词引擎与模板库，470+ 个案例逆向工程，20+ 套工业级模板，并提炼出Skills，持续更新中

JavaScript 7,223 960 Updated Jun 10, 2026

ningzimu / codex-ppt-skill

GPT-Image-2 PPT Generator Skill for Creating Image-Based PowerPoint Presentations in Codex and Other Skill-Compatible Agents

Python 1,465 83 Updated Jun 7, 2026

verl-project / verl-omni

RL training framework for diffusion and omni-modality models

Python 347 50 Updated Jun 9, 2026

facebookresearch / tuna-2

Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation

Python 709 28 Updated Jun 9, 2026

NVIDIA / DreamDojo

Official Codebase for "DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos" (ICML 2026)

Python 929 59 Updated Mar 21, 2026

Tencent-Hunyuan / HY-World-2.0

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Python 2,207 185 Updated May 27, 2026

Bowen12137 / Awesome-World-Models

This repository is the collection of World model Papers

245 4 Updated Apr 8, 2026

unitreerobotics / unifolm-vla

Python 483 47 Updated Jan 29, 2026

AIFrontierLab / TorchUMM

A unified multimodal model toolkit

Python 124 8 Updated May 18, 2026