Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

Fuchao Wei Department of Computer Science and Technology, Tsinghua University Chenglong Bao Yau Mathematical Sciences Center, Tsinghua University; Yanqi Lake Beijing Institute of Mathematical Sciences and Applications Yang Liu Department of Computer Science and Technology, Tsinghua University; Institute for AI Industry Research, Tsinghua University

TBD mathscidoc:2206.43012

NeurIPS, 2021.4
Anderson mixing (AM) is an acceleration method for fixed-point iterations. Despite its success and wide usage in scientific computing, the convergence theory of AM remains unclear, and its applications to machine learning problems are not well explored. In this paper, by introducing damped projection and adaptive regularization to the classical AM, we propose a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems. Under mild assumptions, we establish the convergence theory of SAM, including the almost sure convergence to stationary points and the worst-case iteration complexity. Moreover, the complexity bound can be improved when randomly choosing an iterate as the output. To further accelerate the convergence, we incorporate a variance reduction technique into the proposed SAM. We also propose a preconditioned mixing strategy for SAM which can empirically achieve faster convergence or better generalization ability. Finally, we apply the SAM method to train various neural networks including the vanilla CNN, ResNets, WideResNet, ResNeXt, DenseNet and LSTM. Experimental results on image classification and language model demonstrate the advantages of our method.
No keywords uploaded!
[ Download ] [ 2022-06-15 21:41:51 uploaded by Baocl ] [ 396 downloads ] [ 0 comments ]
@inproceedings{fuchao2021stochastic,
  title={Stochastic Anderson Mixing for Nonconvex Stochastic Optimization},
  author={Fuchao Wei, Chenglong Bao, and Yang Liu},
  url={http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20220615214151300540368},
  booktitle={NeurIPS},
  year={2021},
}
Fuchao Wei, Chenglong Bao, and Yang Liu. Stochastic Anderson Mixing for Nonconvex Stochastic Optimization. 2021. In NeurIPS. http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20220615214151300540368.
Please log in for comment!
 
 
Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved