Mike Z. SHOU

Cited by

	All	Since 2021
Citations	20434	18914
h-index	58	57
i10-index	164	164

8000

4000

2000

6000

201720182019202020212022202320242025202690 318 534 555 839 1013 1923 4109 7522 3479

Public access

View all

86 articles

4 articles

available

not available

Based on funding mandates

Co-authors

Shih-Fu ChangProfessor of Electrical Engineering and Computer Science, Columbia UniversityVerified email at columbia.edu

Mike Z. SHOU

National U. of Singapore; Facebook AI; Columbia University

Verified email at columbia.edu - Homepage

Computer Vision AR/VR Multimedia


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Ego4d: Around the world in 3,000 hours of egocentric video K Grauman, A Westbury, E Byrne, Z Chavis, A Furnari, R Girdhar, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	2003	2022
Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation JZ Wu, Y Ge, X Wang, SW Lei, Y Gu, Y Shi, W Hsu, Y Shan, X Qie, ... Proceedings of the IEEE/CVF international conference on computer vision …, 2023	1379	2023
Temporal action localization in untrimmed videos via multi-stage cnns Z Shou, D Wang, SF Chang Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2016	1259	2016
Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos Z Shou, J Chan, A Zareian, K Miyazawa, SF Chang Proceedings of the IEEE conference on computer vision and pattern …, 2017	723	2017
Show-o: One single transformer to unify multimodal understanding and generation J Xie, W Mao, Z Bai, DJ Zhang, W Wang, KQ Lin, Y Gu, Z Chen, Z Yang, ... International Conference on Learning Representations 2025, 28240-28264, 2025	675	2025
Single shot temporal action detection T Lin, X Zhao, Z Shou Proceedings of the 25th ACM international conference on Multimedia, 988-996, 2017	577	2017
Convnet architecture search for spatiotemporal feature learning D Tran, J Ray, Z Shou, SF Chang, M Paluri arXiv preprint arXiv:1708.05038, 2017	569	2017
Hallucination of multimodal large language models: A survey Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang, MZ Shou arXiv preprint arXiv:2404.18930, 2024	554	2024
Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives K Grauman, A Westbury, L Torresani, K Kitani, J Malik, T Afouras, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	544	2024
Channel augmented joint learning for visible-infrared recognition M Ye, W Ruan, B Du, MZ Shou Proceedings of the IEEE/CVF international conference on computer vision …, 2021	479	2021
Magicanimate: Temporally consistent human image animation using diffusion model Z Xu, J Zhang, JH Liew, H Yan, JW Liu, C Zhang, J Feng, MZ Shou Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	455	2024
Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion J Xie, Y Li, Y Huang, H Liu, W Zhang, Y Zheng, MZ Shou Proceedings of the IEEE/CVF international conference on computer vision …, 2023	378	2023
Show-1: Marrying pixel and latent diffusion models for text-to-video generation DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu, D Gao, MZ Shou International Journal of Computer Vision 133 (4), 1879-1893, 2025	363	2025
Autoloc: Weakly-supervised temporal action localization in untrimmed videos Z Shou, H Gao, L Zhang, K Miyazawa, SF Chang Proceedings of the european conference on computer vision (ECCV), 154-171, 2018	362	2018
Egocentric video-language pretraining KQ Lin, J Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, RC Tu, W Zhao, ... Advances in Neural Information Processing Systems 35, 7575-7586, 2022	332	2022
Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models W Wu, Y Zhao, MZ Shou, H Zhou, C Shen Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	329	2023
Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models Y Gu, X Wang, JZ Wu, Y Shi, Y Chen, Z Fan, W Xiao, R Zhao, S Chang, ... Advances in Neural Information Processing Systems 36, 15890-15902, 2023	324	2023
Univtg: Towards unified video-language temporal grounding KQ Lin, P Zhang, J Chen, S Pramanick, D Gao, AJ Wang, R Yan, MZ Shou Proceedings of the IEEE/CVF international conference on computer vision …, 2023	304	2023
All in one: Exploring unified video-language pre-training J Wang, Y Ge, R Yan, Y Ge, KQ Lin, S Tsutsui, X Lin, G Cai, J Wu, Y Shan, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	304	2023
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection R Tao, Z Pan, RK Das, X Qian, MZ Shou, H Li Proceedings of the 29th ACM international conference on multimedia, 3927-3935, 2021	286	2021

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors