The Q-Q Plot of p-values for Predicting Outcomes with the Gene Expression Data

  • Ito Yoichi M.
    Department of Biostatistics/Epidemiology and Preventive Health Sciences, School of Health Sciences and Nursing, University of Tokyo
  • Fujiwara Yasuhiro
    Breast and Medical Oncology Division, Department of Medical Oncology, National Cancer Center Hospital
  • Ohashi Yasuo
    Department of Biostatistics/Epidemiology and Preventive Health Sciences, School of Health Sciences and Nursing, University of Tokyo

この論文をさがす

抄録

Michiels et al. (2005) showed that a list of genes identified as predictors of prognosis via a non-repeated training — validation approach is unstable and advocate the validation by repeated random sampling. They considered that the genes which were selected as top 50 genes in more than half of their jackknife samples were stable for prediction. However, there is no rationale of the determination of the length of the gene list and the threshold of stability. Since evaluating an accumulation of low p-values in the repeated random sampling is essentially required for a stability assessment, it is better to compare the distribution of p-values of a gene observed with the distribution of p-values under the null hypothesis directly. In this study, the Quantile-Quantile plot (Q-Q plot) of p-values with null reference was proposed for this purpose. We applied the proposed method to a clinical data for primary breast cancer. The Q-Q plot approach can reveal that the genes with a similar p-value in the ordinary analysis have different p-value distributions in the repeated random sampling, and the gene with low p-values accumulated in the repeated random sampling could be evaluated according to the reference lines in the Q-Q plot.

収録刊行物

  • 計量生物学

    計量生物学 28 (1), 37-46, 2007

    日本計量生物学会

参考文献 (16)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ