Hessian spectral analysis for adaptive optimizers of neural networks
-
- MOTOKAWA Tetsuya
- University of Tsukuba
-
- TEZUKA Taro
- University of Tsukuba
Bibliographic Information
- Other Title
-
- ニューラルネットワークの適応的最適化手法におけるヘッセ行列のスペクトル解析
Description
<p>When training neural networks, adaptive optimization methods such as Adam is widely used because of their fast convergence. On the other hand, it has been pointed out that the parameters obtained by these adaptive methods do not generalize as much as ones obtained by SGD. The mechanism behind this difference is still not fully understood. We analyzed convergence points reached by adaptive and non-adaptive methods using the Hessian spectrum of the loss function with respect to parameters. Experiments showed that SGD tends to converge to flatter locations than adaptive optimizers do.</p>
Journal
-
- Proceedings of the Annual Conference of JSAI
-
Proceedings of the Annual Conference of JSAI JSAI2020 (0), 4B3GS105-4B3GS105, 2020
The Japanese Society for Artificial Intelligence
- Tweet
Details 詳細情報について
-
- CRID
- 1390566775143008128
-
- NII Article ID
- 130007857215
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- CiNii Articles
-
- Abstract License Flag
- Disallowed