wdrink

Follow

Junke Wang wdrink

Follow

I'm a final year Ph.D. student from Fudan University, working on multimodal general intelligence.

126 followers · 32 following

Fudan University
Shanghai
https://wdrink.github.io/

Achievements

Achievements

Stars

Andrew0613 / PICABench

PICABench: How Far Are We from Physically Realistic Image Editing?

Python 27 Updated Nov 5, 2025

apple / pico-banana-400k

Python 1,509 65 Updated Oct 28, 2025

Osilly / Interleaving-Reasoning-Generation

This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality, fine-grain…

Python 70 Updated Sep 14, 2025

EzioBy / Ditto

[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Python 442 36 Updated Oct 29, 2025

ModelTC / Qwen-Image-Lightning

Qwen-Image-Lightning: Speed up Qwen-Image model with distillation

Python 907 36 Updated Oct 14, 2025

PRIME-RL / TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

Python 885 65 Updated Sep 26, 2025

dc-ai-projects / DC-Gen

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

Python 274 8 Updated Oct 5, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 1,435 112 Updated Nov 4, 2025

guandeh17 / Self-Forcing

Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)

Python 2,788 197 Updated Sep 12, 2025

yejy53 / Nano-banana-150k

Nano-consistent-150k

Jupyter Notebook 228 8 Updated Oct 20, 2025

NVlabs / LongLive

LongLive: Real-time Interactive Long Video Generation

Python 789 49 Updated Nov 3, 2025

Tencent-Hunyuan / HunyuanImage-3.0

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,372 103 Updated Oct 31, 2025

MengLcool / DeepStack-VL

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 63 3 Updated Jun 17, 2024

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,124 2,425 Updated Nov 5, 2025

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

Fully Open Framework for Democratized Multimodal Training

Python 602 41 Updated Nov 2, 2025

yangzhou24 / OmniWorld

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Python 386 6 Updated Oct 15, 2025

YBYBZhang / Tool-R1

Official pytorch implementation of "Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use"

15 Updated Sep 16, 2025

Tencent-Hunyuan / HunyuanWorld-Voyager

Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.

Python 1,320 121 Updated Oct 22, 2025

TrajectoryCrafter / TrajectoryCrafter

[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Python 784 39 Updated Aug 8, 2025

KwaiVGI / GameFactory

[ICCV 2025] GameFactory: Creating New Games with Generative Interactive Videos

Python 431 15 Updated Mar 22, 2025

KwaiVGI / ReCamMaster

[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Python 1,585 76 Updated Oct 23, 2025

zai-org / GLM-V

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 1,731 102 Updated Oct 28, 2025

Tencent-Hunyuan / Hunyuan-GameCraft-1.0

Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Python 614 68 Updated Oct 16, 2025

runjiali-rl / vmem

[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Python 389 14 Updated Jul 25, 2025

stepfun-ai / NextStep-1

Python 566 16 Updated Oct 20, 2025

InternRobotics / Aether

[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling

Python 520 5 Updated Oct 26, 2025

facebookresearch / dinov3

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,114 546 Updated Nov 3, 2025

SOTAMak1r / DeepVerse

DeepVerse: 4D Autoregressive Video Generation as a World Model

Python 188 8 Updated Aug 11, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,105 1,902 Updated Nov 1, 2025

FoundationVision / Waver

Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

691 63 Updated Aug 27, 2025