Noam Shazeer

Cited by

	All	Since 2021
Citations	346273	319484
h-index	73	67
i10-index	134	116

97000

48500

24250

72750

2018201920202021202220232024202520262388 6976 13592 23229 36315 55906 76115 96580 31257

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Noam Shazeer

Google

Verified email at google.com - Homepage

Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Advances in neural information processing systems 30, 2017	254458	2017
Exploring the limits of transfer learning with a unified text-to-text transformer C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... Journal of machine learning research 21 (140), 1-67, 2020	32115	2020
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of machine learning research 24 (240), 1-113, 2023	9318	2023
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer N Shazeer, A Mirhoseini, K Maziarz, A Davis, Q Le, G Hinton, J Dean arXiv preprint arXiv:1701.06538, 2017	5659	2017
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity W Fedus, B Zoph, N Shazeer Journal of Machine Learning Research 23 (120), 1-39, 2022	4707	2022
Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities G Comanici, E Bieber, M Schaekermann, I Pasupat, N Sachdeva, I Dhillon, ... arXiv preprint arXiv:2507.06261, 2025	2920	2025
Scheduled sampling for sequence prediction with recurrent neural networks S Bengio, O Vinyals, N Jaitly, N Shazeer Advances in neural information processing systems 28, 2015	2862	2015
Image transformer N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran International conference on machine learning, 4055-4064, 2018	2621	2018
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020	2363	2020
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022	2290	2022
Glu variants improve transformer N Shazeer arXiv preprint arXiv:2002.05202, 2020	2034	2020
Exploring the limits of language modeling R Jozefowicz, O Vinyals, M Schuster, N Shazeer, Y Wu arXiv preprint arXiv:1602.02410, 2016	1586	2016
Adafactor: Adaptive learning rates with sublinear memory cost N Shazeer, M Stern International conference on machine learning, 4596-4604, 2018	1414	2018
Music transformer CZA Huang, A Vaswani, J Uszkoreit, N Shazeer, I Simon, C Hawthorne, ... arXiv preprint arXiv:1809.04281, 2018	1392	2018
How much knowledge can you pack into the parameters of a language model? A Roberts, C Raffel, N Shazeer Proceedings of the 2020 conference on empirical methods in natural language …, 2020	1227	2020
Generating wikipedia by summarizing long sequences PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer arXiv preprint arXiv:1801.10198, 2018	1202	2018
Kaiser,., and Polosukhin, I.(2017). Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez Advances in neural information processing systems 30, 2017	992	2017
Advances in neural information processing systems 30 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Curran Associates Inc, 2017	923	2017
End-to-end text-dependent speaker verification G Heigold, I Moreno, S Bengio, N Shazeer 2016 IEEE international conference on acoustics, speech and signal …, 2016	858	2016
Fast transformer decoding: One write-head is all you need N Shazeer arXiv preprint arXiv:1911.02150, 2019	798	2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by