MINIMUM REDUNDANCY FEATURE SELECTION FROM MICROARRAY GENE EXPRESSION DATA

  • CHRIS DING
    Computational Research Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, 94720, USA
  • HANCHUAN PENG
    Life Sciences/Genomics Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, 94720, USA

書誌事項

公開日
2005-04
DOI
  • 10.1142/s0219720005001004
公開者
World Scientific Pub Co Pte Lt

この論文をさがす

説明

<jats:p> How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy — maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naïve Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. </jats:p><jats:p> Supplimentary: The top 60 MRMR genes for each of the datasets are listed in . More information related to MRMR methods can be found at . </jats:p>

収録刊行物

被引用文献 (13)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ