Is overfitting really a problem?
-
- Kaneko Hiromasa
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo
-
- Funatsu Kimito
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo
Bibliographic Information
- Other Title
-
- オーバーフィッティングは本当に問題か?
Description
Accuracy and applicability domains (ADs) of regression models are discussed in our presentation. Generally, we construct a regression model so as to prevent overfitting to training data and to have highly predictive performance for diverse compounds. However, an overfitted model must have highly predictive ability only within an AD, which is narrowly limited. In this study, the aqueous solubility data set was analyzed to compare performance of regression models while considering their ADs. Support vector regression (SVR) was used as a regression analysis method and hyperparameters of SVR changed. The ADs were set based on data density. There existed two types of SVR models. One is well-constructed SVR models that could predict solubility values for diverse compounds. The other is overfitted SVR models that seemed to have bad predictive ability but provided better prediction results for compounds within the ADs than the other type of SVR models. It was confirmed that overfitting itself was not a problem and we could operate overfitted models by setting their ADs appropriately.
Journal
-
- Proceedings of the Symposium on Chemoinformatics
-
Proceedings of the Symposium on Chemoinformatics 2015 (0), 28-31, 2015
Division of Chemical Information and Computer Sciences The Chemical Society of Japan
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390001205736916096
-
- NII Article ID
- 130005146334
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- CiNii Articles
-
- Abstract License Flag
- Disallowed