|
ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images
Y. Wu, H. Cheng, Z. He, S. Liu
paper /
code
Naively fine-tuning by single LoRA with high-resolution images introduces noise and artifacts. We introduce Relay-LoRA, a two-stage fine-tuning method that reduces noise and enhances visual detail.
|
|
FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video Generation
Y. Wu, J. Song, Z. Tan, Z. He, S. Liu
paper /
code
We identify the root cause of degradation at high resolutions and propose an efficient Flex-Attention-based interpolation window masking mechanism for seamless 4K video generation.
|
|
MitPose: Multi-Granularity Guided Vision Transformer for Human Pose Estimation
Y. Wu, Q. Gao, Y. Liu, J. Sun, Z. Li, Y. Jin, Y. Yue, X. Zhu
INDIN, 2025
paper /
code
We introduce an innovative over-parameterized convolution and global-attention complementary mechanism for multi-granularity feature representation, achieving SOTA performance on COCO and MPII benchmarks.
|
|
Alibaba Group
Research Intern | Supervised by Xiangxiang Chu
Research Direction: World Model
|
|
Shanghai Jiao Tong University
Research Intern | Supervised by Songhua Liu
Research Direction: Video Generation
|
|
Xi'an Jiaotong-Liverpool University
Research Assistant | Supervised by Yong Yue
Research Direction: Human Pose Estimation
|
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.
|
|