AdaTerm: Adaptive T-distribution estimated robust moments for Noise-Robust stochastic gradient optimization

Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Takamitsu Matsubara

doi:10.48550/arxiv.2201.06714

With the increasing practicality of deep learning applications, practitioners are inevitably faced with datasets corrupted by noise from various sources such as measurement errors, mislabeling, and estimated surrogate inputs/outputs that can adversely impact the optimization results. It is a common practice to improve the optimization algorithm's robustness to noise, since this algorithm is ultimately in charge of updating the network parameters. Previous studies revealed that the first-order moment used in Adam-like stochastic gradient descent optimizers can be modified based on the Student's t-distribution. While this modification led to noise-resistant updates, the other associated statistics remained unchanged, resulting in inconsistencies in the assumed models. In this paper, we propose AdaTerm, a novel approach that incorporates the Student's t-distribution to derive not only the first-order moment but also all the associated statistics. This provides a unified treatment of the optimization process, offering a comprehensive framework under the statistical model of the t-distribution for the first time. The proposed approach offers several advantages over previously proposed approaches, including reduced hyperparameters and improved robustness and adaptability. This noise-adaptive behavior contributes to AdaTerm's exceptional learning performance, as demonstrated through various optimization problems with different and/or unknown noise ratios. Furthermore, we introduce a new technique for deriving a theoretical regret bound without relying on AMSGrad, providing a valuable contribution to the field

27 pages; Final version accepted by Elsevier Neurocomputing Journal (2023-08; https://doi.org/10.1016/j.neucom.2023.126692)

AdaTerm: Adaptive T-distribution estimated robust moments for Noise-Robust stochastic gradient optimization

書誌事項

この論文をさがす

説明

収録刊行物

参考文献 (38)*注記

関連プロジェクト

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

AdaTerm: Adaptive T-distribution estimated robust moments for Noise-Robust stochastic gradient optimization

書誌事項

この論文をさがす

説明

収録刊行物

参考文献 (38)*注記

関連プロジェクト

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について