Skip to content
View yujinhanml's full-sized avatar
🌕
🌕

Highlights

  • Pro

Block or report yujinhanml

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,487 182 Updated Mar 28, 2025

Official implementation of Continuous 3D Perception Model with Persistent State

Python 1,335 74 Updated Aug 27, 2025

[ICLR2026] The official code of "Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance"

Python 27 1 Updated Feb 9, 2026

Official Code Repo for UniVA: Universal Video Agents

TypeScript 349 52 Updated Jan 27, 2026

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,383 651 Updated Sep 26, 2024

(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"

Python 45 2 Updated Jul 1, 2025

The official implementation of paper "Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation"

13 Updated Jan 30, 2026

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 7,277 426 Updated Feb 10, 2026

Automatic Metric for Evaluating Generated Videos

Python 32 1 Updated Dec 8, 2025

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Python 88 Updated Feb 5, 2026

Official Implementation of VideoDPO

Python 160 2 Updated Jun 1, 2025

official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)

178 8 Updated Aug 7, 2024

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Python 952 38 Updated Mar 19, 2025

[ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

116 1 Updated Oct 7, 2025

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

Python 92 Updated Dec 1, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 69,999 13,355 Updated Feb 10, 2026

[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Python 633 23 Updated Feb 10, 2026

[NeurIPS 2025] Improving Video Generation with Human Feedback

Python 420 11 Updated Sep 24, 2025

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

Python 40 Updated Oct 30, 2025

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

Python 211 6 Updated Feb 3, 2026
7 Updated Nov 11, 2025

A curated list of papers on reinforcement learning for video generation

339 Updated Feb 3, 2026

Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.

Python 209 8 Updated Oct 12, 2025

Enjoy the magic of Diffusion models!

Python 11,769 1,137 Updated Feb 10, 2026

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,514 74 Updated Oct 16, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,990 127 Updated Nov 4, 2025

[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Python 563 26 Updated Jan 5, 2026

This is a repository dedicated to high quality figures from EMNLP 2025 long papers.

50 6 Updated Dec 15, 2025

This is a repository dedicated to high quality figures from ACL 2025 long papers.

135 7 Updated Dec 15, 2025
Next