Detection of protease digestion site using fluorescence labeled peptide array and construction of machine-learning prediction model

  • Mizutani Ryota
    Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University
  • Mori Yoko
    Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University
  • Ogawa Shodai
    Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University
  • Tazoe Kaho
    Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University
  • Akiyama Hirokazu
    Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University
  • Shimizu Kazunori
    Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University
  • Honda Hiroyuki
    Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University

Bibliographic Information

Other Title
  • 蛍光標識ペプチドアレイを用いたプロテアーゼ切断検出と機械学習を用いた切断部位予測
  • ケイコウ ヒョウシキ ペプチドアレイ オ モチイタ プロテアーゼ セツダン ケンシュツ ト キカイ ガクシュウ オ モチイタ セツダン ブイ ヨソク

Search this article

Abstract

<p>For the prediction of protease digestion site, 1990 kinds of tetramer peptide were designed as a library, of which N terminal end was fluorescence labelled. Decrease of fluorescence intensity of each peptide was quantitatively determined and used as a training data for Random Forest (RF) modeling. All of dipeptides bond as a protease substrate were included in 1990 tetramer peptides. As an explanatory variable for model construction, 532 parameters were prepared and those were included not only appearance of amino acid residue or dipeptides but also positioning parameters (PP) or global parameters (GP) to explain the peptide property. Trypsin for biochemistry grade was used for hydrolysis. After digesting at pH 8, the histogram of digestion ratio was appeared that tetramers with arginine residue (R) and lysine residue (K) could be expectedly digested at higher ratio compared with tetramers without R or K. Constructed pH 8-RF model showed 77 % of prediction accuracy. The explanatory parameters with top 10 higher importance in pH 8-RF model were both of appearance of K and 9 GPs. GPs were including 4 isoelectric parameters and 1 polarity parameter, of which K or R residue have relatively large value. Digestion site of α-lactalbumin was determined using pH 8-RF model and the protein was actually digested by trypsin. When the hydrolysate was analyzed by LC-MS/MS, 42 peptides were identified. Fourteen sites among 19 predicted digestion sites were coincided with each other. Many peptide fragments digested at 69Y-70G site or 50F-51H site was detected and it was strongly suggested to be digested by chymotrypsin as a contaminated protease. These sites were fairly predicted by constructed RF model. In addition, trypsin digestion at pH 5 was carried out to investigate the effect of pH decrease on trypsin digestion. The prediction accuracy of pH 5-RF model was only 59 %. The parameters with top 10 higher importance in pH 5-RF model were including 4 GPs which was the same with 4 isoelectric parameter in pH 8 model. Twenty digestion sites of α-lactalbumin were predicted from pH 5-RF model. α-lactalbumin was digested at pH 5 by trypsin, and 64 peptides were identified. The predicted sites were compared with identified fragments at the region from 48T to 77K in which a lot of fragments were detected. It was concluded that constructed model could fairly predict some of newly identified sites and could roughly grasp slight modification of digestion site by pH change. The proposed methodology, which contained 1990 kinds of tetramer peptides as a library, 532 explanatory parameters and RF model, was effective for prediction of protease digestion site.</p>

Journal

  • Seibutsu-kogaku Kaishi

    Seibutsu-kogaku Kaishi 100 (10), 528-540, 2022-10-25

    The Society for Biotechnology, Japan

Details 詳細情報について

Report a problem

Back to top