Follow
Mike Z. SHOU
Mike Z. SHOU
National U. of Singapore; Facebook AI; Columbia University
Verified email at columbia.edu - Homepage
Title
Cited by
Cited by
Year
Ego4d: Around the world in 3,000 hours of egocentric video
K Grauman, A Westbury, E Byrne, Z Chavis, A Furnari, R Girdhar, ...
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
20032022
Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation
JZ Wu, Y Ge, X Wang, SW Lei, Y Gu, Y Shi, W Hsu, Y Shan, X Qie, ...
Proceedings of the IEEE/CVF international conference on computer vision …, 2023
13792023
Temporal action localization in untrimmed videos via multi-stage cnns
Z Shou, D Wang, SF Chang
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2016
12592016
Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos
Z Shou, J Chan, A Zareian, K Miyazawa, SF Chang
Proceedings of the IEEE conference on computer vision and pattern …, 2017
7232017
Show-o: One single transformer to unify multimodal understanding and generation
J Xie, W Mao, Z Bai, DJ Zhang, W Wang, KQ Lin, Y Gu, Z Chen, Z Yang, ...
International Conference on Learning Representations 2025, 28240-28264, 2025
6752025
Single shot temporal action detection
T Lin, X Zhao, Z Shou
Proceedings of the 25th ACM international conference on Multimedia, 988-996, 2017
5772017
Convnet architecture search for spatiotemporal feature learning
D Tran, J Ray, Z Shou, SF Chang, M Paluri
arXiv preprint arXiv:1708.05038, 2017
5692017
Hallucination of multimodal large language models: A survey
Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang, MZ Shou
arXiv preprint arXiv:2404.18930, 2024
5542024
Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives
K Grauman, A Westbury, L Torresani, K Kitani, J Malik, T Afouras, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
5442024
Channel augmented joint learning for visible-infrared recognition
M Ye, W Ruan, B Du, MZ Shou
Proceedings of the IEEE/CVF international conference on computer vision …, 2021
4792021
Magicanimate: Temporally consistent human image animation using diffusion model
Z Xu, J Zhang, JH Liew, H Yan, JW Liu, C Zhang, J Feng, MZ Shou
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
4552024
Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion
J Xie, Y Li, Y Huang, H Liu, W Zhang, Y Zheng, MZ Shou
Proceedings of the IEEE/CVF international conference on computer vision …, 2023
3782023
Show-1: Marrying pixel and latent diffusion models for text-to-video generation
DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu, D Gao, MZ Shou
International Journal of Computer Vision 133 (4), 1879-1893, 2025
3632025
Autoloc: Weakly-supervised temporal action localization in untrimmed videos
Z Shou, H Gao, L Zhang, K Miyazawa, SF Chang
Proceedings of the european conference on computer vision (ECCV), 154-171, 2018
3622018
Egocentric video-language pretraining
KQ Lin, J Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, RC Tu, W Zhao, ...
Advances in Neural Information Processing Systems 35, 7575-7586, 2022
3322022
Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models
W Wu, Y Zhao, MZ Shou, H Zhou, C Shen
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
3292023
Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models
Y Gu, X Wang, JZ Wu, Y Shi, Y Chen, Z Fan, W Xiao, R Zhao, S Chang, ...
Advances in Neural Information Processing Systems 36, 15890-15902, 2023
3242023
Univtg: Towards unified video-language temporal grounding
KQ Lin, P Zhang, J Chen, S Pramanick, D Gao, AJ Wang, R Yan, MZ Shou
Proceedings of the IEEE/CVF international conference on computer vision …, 2023
3042023
All in one: Exploring unified video-language pre-training
J Wang, Y Ge, R Yan, Y Ge, KQ Lin, S Tsutsui, X Lin, G Cai, J Wu, Y Shan, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
3042023
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
R Tao, Z Pan, RK Das, X Qian, MZ Shou, H Li
Proceedings of the 29th ACM international conference on multimedia, 3927-3935, 2021
2862021
The system can't perform the operation now. Try again later.
Articles 1–20