AdaTerm: Adaptive T-distribution estimated robust moments for Noise-Robust stochastic gradient optimization
書誌事項
- 公開日
- 2023-11
- 資源種別
- journal article
- 権利情報
-
- https://www.elsevier.com/tdm/userlicense/1.0/
- https://www.elsevier.com/legal/tdmrep-license
- http://www.elsevier.com/open-access/userlicense/1.0/
- https://doi.org/10.15223/policy-017
- https://doi.org/10.15223/policy-037
- https://doi.org/10.15223/policy-012
- https://doi.org/10.15223/policy-029
- https://doi.org/10.15223/policy-004
- DOI
-
- 10.1016/j.neucom.2023.126692
- 10.48550/arxiv.2201.06714
- 公開者
- Elsevier BV
この論文をさがす
説明
With the increasing practicality of deep learning applications, practitioners are inevitably faced with datasets corrupted by noise from various sources such as measurement errors, mislabeling, and estimated surrogate inputs/outputs that can adversely impact the optimization results. It is a common practice to improve the optimization algorithm's robustness to noise, since this algorithm is ultimately in charge of updating the network parameters. Previous studies revealed that the first-order moment used in Adam-like stochastic gradient descent optimizers can be modified based on the Student's t-distribution. While this modification led to noise-resistant updates, the other associated statistics remained unchanged, resulting in inconsistencies in the assumed models. In this paper, we propose AdaTerm, a novel approach that incorporates the Student's t-distribution to derive not only the first-order moment but also all the associated statistics. This provides a unified treatment of the optimization process, offering a comprehensive framework under the statistical model of the t-distribution for the first time. The proposed approach offers several advantages over previously proposed approaches, including reduced hyperparameters and improved robustness and adaptability. This noise-adaptive behavior contributes to AdaTerm's exceptional learning performance, as demonstrated through various optimization problems with different and/or unknown noise ratios. Furthermore, we introduce a new technique for deriving a theoretical regret bound without relying on AMSGrad, providing a valuable contribution to the field
27 pages; Final version accepted by Elsevier Neurocomputing Journal (2023-08; https://doi.org/10.1016/j.neucom.2023.126692)
収録刊行物
-
- Neurocomputing
-
Neurocomputing 557 126692-, 2023-11
Elsevier BV
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1360021391858844544
-
- ISSN
- 09252312
-
- 資料種別
- journal article
-
- データソース種別
-
- Crossref
- KAKEN
- OpenAIRE

