- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- 【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Decisive Roles of Sequence Distributions in the Generalizability of<i>de novo</i>Deep Learning Models for RNA Secondary Structure Prediction
Description
<jats:title>ABSTRACT</jats:title><jats:p>Taking sequences as the only inputs, the class of<jats:italic>de novo</jats:italic>deep learning (DL) models for RNA secondary structure prediction has achieved far superior performances than traditional algorithms. However, key questions remain over the statistical underpinning of such models that make no use of physical laws or co-evolutionary information. We present a quantitative study of the capacity and generalizability of a series of<jats:italic>de novo</jats:italic>DL models, with a minimal two-module architecture and no post-processing, under varied distributions of the seen and unseen sequences. Our DL models outperform existing methods on commonly used benchmark datasets and demonstrate excellent learning capacities under all sequence distributions. These DL models generalize well over non-identical unseen sequences, but the generalizability degrades rapidly as the sequence distributions of the seen and unseen datasets become dissimilar. Examinations of RNA family-specific behaviors manifest not only disparate familydependent performances but substantial generalization gaps within the same family. We further determine how model generalization decreases with the decrease of sequence similarity via pairwise sequence alignment, providing quantitative insights into the limitations of statistical learning. Model generalizability thus poses a major hurdle for practical uses of<jats:italic>de novo</jats:italic>DL models and several tenable avenues for future advances are discussed.</jats:p>
- Tweet
Details 詳細情報について
-
- CRID
- 1360865821460749312
-
- Article Type
- preprint
-
- Data Source
-
- Crossref