Google Scholar

Provable adaptivity in adam

B Wang, Y Zhang, H Zhang, Q Meng, ZM Ma… - arXiv preprint arXiv …, 2022 - arxiv.org

… the convergence for Adam with practical hyperparameters. Specifically, we argue that Adam
can adapt to the local smoothness condition, justifying the adaptivity of Adam. In contrast, …

Save Cite Cited by 40 Related articles All 3 versions View as HTML

[PDF] openreview.net

Provable Benefit of Adaptivity in Adam

B Wang, Y Zhang, H Zhang, Q Meng, R Sun, ZM Ma… - openreview.net

… Adam (RR Adam), which is the major version of Adam adopted in deep learning. We present
the first convergence analysis of RR Adam … Adam, we believe it is important to study Adam …

Save Cite Related articles View as HTML

[PDF] neurips.cc

Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2023 - proceedings.neurips.cc

… the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives.
Despite the popularity and efficiency of the Adam … In this paper, we show that Adam provably …

Save Cite Cited by 55 Related articles All 5 versions View as HTML

[PDF] arxiv.org

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

IV Modoranu, M Safaryan, G Malinovsky… - arXiv preprint arXiv …, 2024 - arxiv.org

… In this setup, we compare MICROADAM with Adam and Adam-8bit in terms of evaluation
accuracy and memory usage. In Table 2 we show our results for 3 training epochs, global batch …

Save Cite Cited by 6 Related articles All 3 versions View as HTML

[PDF] neurips.cc

Adam can converge without any modification on update rules

Y Zhang, C Chen, N Shi, R Sun… - Advances in neural …, 2022 - proceedings.neurips.cc

… Can Adam provably converge without any modification on its update rules? … of adaptive
gradient methods because √ v no longer dominates in the choice of stepsize. In this case, Adam …

Save Cite Cited by 74 Related articles All 9 versions View as HTML

[PDF] arxiv.org

Proximal adam: robust adaptive update scheme for constrained optimization

P Melchior, R Joseph, F Moolekamp - arXiv preprint arXiv:1910.10094, 2019 - arxiv.org

We implement the adaptive step size scheme from the optimization methods AdaGrad and
Adam in a novel variant of the Proximal Gradient Method (PGM). Our algorithm, dubbed …

Save Cite Cited by 6 Related articles All 2 versions View as HTML

[PDF] arxiv.org

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Y Hong, J Lin - arXiv preprint arXiv:2402.03982, 2024 - arxiv.org

… Generally speaking, Adam absorbs some key ideas from previous adaptive methods [13,
43] while adding more unique structures. It combines the exponential moving average …

Save Cite Cited by 2 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Adam: A Stochastic Method with Adaptive Variance Reduction

M Liu, W Zhang, F Orabona, T Yang - arXiv preprint arXiv:2011.11985, 2020 - arxiv.org

… Variants of Adam have been proposed with provable convergence guarantee, but they …
adaptive convergence. We also propose a more general variant of Adam+ with different adaptive …

Save Cite Cited by 32 Related articles All 3 versions View as HTML

[PDF] mit.edu

Smoothness and Adaptivity in Nonlinear Optimization for Machine Learning Applications

H Li - 2024 - dspace.mit.edu

… In this thesis, we show that Adam provably converges to … can not explain why adaptive
methods like Adam significantly … gence rate we have obtained for Adam is not faster than that of …

Save Cite Related articles View as HTML

[PDF] arxiv.org

Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance

Q Zhang, Y Zhou, S Zou - arXiv preprint arXiv:2404.01436, 2024 - arxiv.org

… , we study two adaptive optimizers: RMSProp and Adam with … the original Adam, we make
a minor change in the adaptive … This minor change does not influence the adaptivity of the …

Save Cite Cited by 2 Related articles All 2 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

Provable adaptivity in adam

Provable Benefit of Adaptivity in Adam

Convergence of adam under relaxed assumptions

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Adam can converge without any modification on update rules

Proximal adam: robust adaptive update scheme for constrained optimization

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Adam: A Stochastic Method with Adaptive Variance Reduction

Smoothness and Adaptivity in Nonlinear Optimization for Machine Learning Applications

Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance