Sample-to-sample p-value variability and its implications for multivariate analysis

Wei Wang, Wilson Wen Bin Goh*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that pvalues are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.

Original languageEnglish
Pages (from-to)235-254
Number of pages20
JournalInternational Journal of Bioinformatics Research and Applications
Volume14
Issue number3
DOIs
Publication statusPublished - 2018
Externally publishedYes

Bibliographical note

Publisher Copyright:
Copyright © 2018 Inderscience Enterprises Ltd.

ASJC Scopus Subject Areas

  • Biomedical Engineering
  • Health Informatics
  • Clinical Biochemistry
  • Health Information Management

Keywords

  • P-value
  • Statistical feature selection
  • T-test
  • Variability
  • Wilcoxon rank-sum test

Fingerprint

Dive into the research topics of 'Sample-to-sample p-value variability and its implications for multivariate analysis'. Together they form a unique fingerprint.

Cite this