Abstract
Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that pvalues are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.
Original language | English |
---|---|
Pages (from-to) | 235-254 |
Number of pages | 20 |
Journal | International Journal of Bioinformatics Research and Applications |
Volume | 14 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2018 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:Copyright © 2018 Inderscience Enterprises Ltd.
ASJC Scopus Subject Areas
- Biomedical Engineering
- Health Informatics
- Clinical Biochemistry
- Health Information Management
Keywords
- P-value
- Statistical feature selection
- T-test
- Variability
- Wilcoxon rank-sum test