Skip to content
View wyclike's full-sized avatar

Block or report wyclike

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 607 51 Updated Oct 29, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 35,902 4,235 Updated Dec 14, 2025

Native Multimodal Models are World Learners

Python 1,367 52 Updated Nov 28, 2025

UniVid: The Open-Source Unified Video Model

Python 29 Updated Oct 13, 2025

my commonly-used tools

Jupyter Notebook 63 5 Updated Jan 7, 2025

Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"

Python 18 Updated Nov 1, 2025

Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch

Python 493 13 Updated Jan 12, 2025

Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"

Jupyter Notebook 132 2 Updated Oct 17, 2025

[NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark

Python 251 3 Updated Nov 5, 2025

An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

Python 153 7 Updated Nov 5, 2025

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 1,113 60 Updated Nov 9, 2025

Inference script for Oasis 500M

Python 1,999 171 Updated Nov 8, 2024
Python 9 Updated Jul 31, 2025

A benchmark for evaluating vision-centric, complex video reasoning.

Python 35 1 Updated Aug 26, 2025

Open-source unified multimodal model

Python 5,491 480 Updated Oct 27, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,312 1,252 Updated Aug 4, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,178 736 Updated May 31, 2024

Structured Video Comprehension of Real-World Shorts

Python 227 7 Updated Sep 21, 2025

[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark

Python 131 4 Updated Jul 9, 2025

Official repo and evaluation implementation of VSI-Bench

Python 656 39 Updated Aug 5, 2025

【干货】史上最全的PyTorch学习资源汇总

Python 4,651 835 Updated Aug 14, 2019

The Next Step Forward in Multimodal LLM Alignment

Python 192 8 Updated May 1, 2025

Official code for MotionBench (CVPR 2025)

Python 61 2 Updated Mar 3, 2025

Ultra-high-performance, secure, all-in-one acceleration engine for developer resources

JavaScript 7,116 963 Updated Dec 16, 2025

个人构建MoE大模型:从预训练到DPO的完整实践

Python 2,084 156 Updated Dec 16, 2025

Tracking the latest and greatest research papers on video generation.

97 7 Updated Dec 5, 2025

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 676 25 Updated Sep 24, 2025
Python 54 1 Updated Dec 7, 2025

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 1,049 75 Updated Nov 25, 2025
Next