Hessian spectral analysis for adaptive optimizers of neural networks

DOI

Bibliographic Information

Other Title
  • ニューラルネットワークの適応的最適化手法におけるヘッセ行列のスペクトル解析

Description

<p>When training neural networks, adaptive optimization methods such as Adam is widely used because of their fast convergence. On the other hand, it has been pointed out that the parameters obtained by these adaptive methods do not generalize as much as ones obtained by SGD. The mechanism behind this difference is still not fully understood. We analyzed convergence points reached by adaptive and non-adaptive methods using the Hessian spectrum of the loss function with respect to parameters. Experiments showed that SGD tends to converge to flatter locations than adaptive optimizers do.</p>

Journal

Details 詳細情報について

  • CRID
    1390566775143008128
  • NII Article ID
    130007857215
  • DOI
    10.11517/pjsai.jsai2020.0_4b3gs105
  • Text Lang
    ja
  • Data Source
    • JaLC
    • CiNii Articles
  • Abstract License Flag
    Disallowed

Report a problem

Back to top