Minimizing the overlap problem in protein NMR: a computational framework for precision amino acid labeling

  • Michael J. Sweredoski
    1 Department of Computer Science, 2Department of Chemistry and 3Institute for Genomics and Bioinformatics, University of California, Irvine, USA
  • Kevin J. Donovan
    1 Department of Computer Science, 2Department of Chemistry and 3Institute for Genomics and Bioinformatics, University of California, Irvine, USA
  • Bao D. Nguyen
    1 Department of Computer Science, 2Department of Chemistry and 3Institute for Genomics and Bioinformatics, University of California, Irvine, USA
  • A.J. Shaka
    1 Department of Computer Science, 2Department of Chemistry and 3Institute for Genomics and Bioinformatics, University of California, Irvine, USA
  • Pierre Baldi
    1 Department of Computer Science, 2Department of Chemistry and 3Institute for Genomics and Bioinformatics, University of California, Irvine, USA

抄録

<jats:title>Abstract</jats:title><jats:p>Motivation: Recent advances in cell-free protein expression systems allow specific labeling of proteins with amino acids containing stable isotopes (15N, 13 C and 2H), an important feature for protein structure determination by nuclear magnetic resonance (NMR) spectroscopy. Given this labeling ability, we present a mathematical optimization framework for designing a set of protein isotopomers, or labeling schedules, to reduce the congestion in the NMR spectra. The labeling schedules, which are derived by the optimization of a cost function, are tailored to a specific protein and NMR experiment.</jats:p><jats:p>Results: For 2D 15N-1H HSQC experiments, we can produce an exact solution using a dynamic programming algorithm in under 2 h on a standard desktop machine. Applying the method to a standard benchmark protein, calmodulin, we are able to reduce the number of overlaps in the 500 MHz HSQC spectrum from 10 to 1 using four samples with a true cost function, and 10 to 4 if the cost function is derived from statistical estimates. On a set of 448 curated proteins from the BMRB database, we are able to reduce the relative percent congestion by 84.9% in their HSQC spectra using only four samples. Our method can be applied in a high-throughput manner on a proteomic scale using the server we developed. On a 100-node cluster, optimal schedules can be computed for every protein coded for in the human genome in less than a month.</jats:p><jats:p>Availability: A server for creating labeling schedules for 15N-1H HSQC experiments as well as results for each of the individual 448 proteins used in the test set is available at http://nmr.proteomics.ics.uci.edu.</jats:p><jats:p>Contact:  pfbaldi@ics.uci.edu</jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p>

収録刊行物

  • Bioinformatics

    Bioinformatics 23 (21), 2829-2835, 2007-09-25

    Oxford University Press (OUP)

被引用文献 (1)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ