Predicting the pathogenicity of missense variants using features derived from AlphaFold2

抄録

<jats:title>ABSTRACT</jats:title><jats:p>Each individual genome harbors multiple missense variants, which can be systematically identified via genome or exome sequencing. This class of genetic variation can alter the functional properties of the respective protein, and thereby lead to clinically relevant phenotypes, such as cancer or Mendelian diseases. Despite advances in computational prediction scores, the classification of missense variants as clinically significant or benign remains a major challenge. Recently, the structure of the human proteome was derived with unprecedented accuracy using the artificial intelligence system AlphaFold2. However, the question of whether AlphaFold2 structures can improve the accuracy of computational pathogenicity prediction for missense variants remains unclear. To address this, we first engineered a set of features for each amino acid from these structures. We then trained a random forest to distinguish between proxy-benign and proxy-pathogenic missense variants derived from gnomAD. This yielded a novel AlphaFold2-based pathogenicity prediction score, termed AlphScore. Important feature classes used by AlphScore are solvent accessibility, amino acid network related features, features describing the physicochemical environment, and AlphaFold2’s quality parameter (pLDDT). AlphScore alone showed lower performance than existing scores, such as CADD or REVEL. However, when AlphScore was added to those scores, the performance always increased, as measured by the approximation of deep mutational scan data, as well as the prediction of expert-curated missense variants from the ClinVar database. Overall, our data indicate that the integration of AlphaFold2 predicted structures can improve pathogenicity prediction of missense variants.</jats:p>

被引用文献 (1)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ