Skip to content
View wusize's full-sized avatar

Highlights

  • Pro

Block or report wusize

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

Python 184 18 Updated Jun 14, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,662 2,861 Updated Dec 21, 2025

A PyTorch native platform for training generative AI models

Python 4,861 645 Updated Dec 21, 2025

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Python 1,227 41 Updated Oct 26, 2025

Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.

Python 84 4 Updated May 4, 2025

Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

Python 335 11 Updated Dec 16, 2025

DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support

Python 718 62 Updated Mar 22, 2024

Unified Multimodal Model for image generation/editing/understanding

Python 818 38 Updated Sep 8, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,445 1,995 Updated Nov 1, 2025

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Python 238 5 Updated Aug 15, 2025

Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"

Jupyter Notebook 165 4 Updated Dec 17, 2025

Official code for ICCV 2025 paper, X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

Python 88 3 Updated Jun 26, 2025

A framework that allows you to apply Sparse AutoEncoder on any models

Python 48 2 Updated Jul 11, 2025

SigLIP-based Aesthetic Score Predictor

Python 365 8 Updated Dec 18, 2024

Open protocol for communication between AI agents, applications, and humans.

Python 907 111 Updated Aug 25, 2025

Official Implementation of Paper Transfer between Modalities with MetaQueries

Python 280 9 Updated Oct 12, 2025

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

Python 26 Updated Sep 7, 2025

Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)

Python 2,982 219 Updated Sep 12, 2025

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Python 233 7 Updated May 30, 2025

Repo for SeedVR2 & SeedVR (CVPR2025 Highlight)

Python 842 50 Updated Jul 2, 2025
Python 167 8 Updated Jun 27, 2025

Open-source unified multimodal model

Python 5,491 480 Updated Oct 27, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,763 105 Updated Nov 4, 2025

Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"

Python 422 22 Updated Jun 20, 2025

[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

146 1 Updated Jun 13, 2024

这个仓库有1426个star,不信你试试

Python 1,435 37 Updated Sep 13, 2022

[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Python 183 5 Updated May 21, 2025

[CVPR 2025] EgoLife: Towards Egocentric Life Assistant

Python 365 19 Updated Mar 19, 2025
Next