Skip to content
View wyclike's full-sized avatar

Block or report wyclike

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its size.

388 26 Updated Jan 21, 2026

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,604 142 Updated Feb 3, 2026

MiMo-VL

622 29 Updated Aug 21, 2025

My learning notes for ML SYS.

Python 5,257 341 Updated Jan 30, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,293 405 Updated Jan 19, 2026

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]

Python 116 3 Updated Jan 27, 2026

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 634 53 Updated Oct 29, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 38,642 4,643 Updated Jan 30, 2026

Native Multimodal Models are World Learners

Python 1,445 54 Updated Dec 30, 2025

UniVid: The Open-Source Unified Video Model

Python 31 Updated Oct 13, 2025

my commonly-used tools

Jupyter Notebook 64 5 Updated Jan 7, 2025

Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"

Python 20 Updated Nov 1, 2025

Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch

Python 514 15 Updated Jan 18, 2026

Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"

Jupyter Notebook 134 2 Updated Oct 17, 2025

[NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark

Python 275 6 Updated Nov 5, 2025

(ICLR 2026) An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

Python 183 6 Updated Jan 26, 2026

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 1,150 63 Updated Nov 9, 2025

Inference script for Oasis 500M

Python 2,042 172 Updated Nov 8, 2024
Python 11 1 Updated Jul 31, 2025

[ICLR 2026] "VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?", Yuanxin Liu, Kun Ouyang, Haoning Wu, Yi Liu, Lin Sui, Xinhao Li, Yan Zhong, Y. Charles, Xinyu Zhou, Xu Sun

Python 35 1 Updated Jan 30, 2026

Open-source unified multimodal model

Python 5,626 497 Updated Oct 27, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,438 1,265 Updated Aug 4, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,332 751 Updated May 31, 2024

Structured Video Comprehension of Real-World Shorts

Python 230 7 Updated Sep 21, 2025

[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark

Python 135 4 Updated Jul 9, 2025

Official repo and evaluation implementation of VSI-Bench

Python 667 43 Updated Aug 5, 2025

【干货】史上最全的PyTorch学习资源汇总

Python 4,674 836 Updated Aug 14, 2019

The Next Step Forward in Multimodal LLM Alignment

Python 196 8 Updated May 1, 2025

Official code for MotionBench (CVPR 2025)

Python 63 2 Updated Mar 3, 2025
Next