PhD student at the Chinese University of Hong Kong, Shenzhen, China
https://zyushun.github.io/
Highlights
- Pro
-
-
-
-
Adam-mini Public
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
-
-
-
-
hessian-spectrum Public
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
-
-
zotero-arxiv-daily Public
Forked from TideDra/zotero-arxiv-dailyRecommend new arxiv papers of your interest daily according to your Zotero libarary.
Python GNU Affero General Public License v3.0 UpdatedDec 31, 2024 -
-
iclr-blog-track.github.io Public
Forked from iclr-blog-track/iclr-blog-track.github.ioICLR 2022 Blog-Track: Does Adam Converge and When?
HTML Other UpdatedApr 15, 2022