Sample-to-sample p-value variability and its implications for multivariate analysis

Wei Wang; Wilson Wen Bin Goh

doi:10.1504/IJBRA.2018.092691

Sample-to-sample p-value variability and its implications for multivariate analysis

Wei Wang, Wilson Wen Bin Goh^*

^*Corresponding author for this work

Tianjin University

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that pvalues are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.

Original language	English
Pages (from-to)	235-254
Number of pages	20
Journal	International Journal of Bioinformatics Research and Applications
Volume	14
Issue number	3
DOIs	https://doi.org/10.1504/IJBRA.2018.092691
Publication status	Published - 2018
Externally published	Yes

Bibliographical note

Publisher Copyright:
Copyright © 2018 Inderscience Enterprises Ltd.

ASJC Scopus Subject Areas

Biomedical Engineering
Health Informatics
Clinical Biochemistry
Health Information Management

Keywords

P-value
Statistical feature selection
T-test
Variability
Wilcoxon rank-sum test

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1504/IJBRA.2018.092691

Cite this

@article{0fe42713e2b24b0ea882132f73fc9be3,

title = "Sample-to-sample p-value variability and its implications for multivariate analysis",

abstract = "Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that pvalues are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.",

keywords = "P-value, Statistical feature selection, T-test, Variability, Wilcoxon rank-sum test",

author = "Wei Wang and Goh, \{Wilson Wen Bin\}",

note = "Publisher Copyright: Copyright {\textcopyright} 2018 Inderscience Enterprises Ltd.",

year = "2018",

doi = "10.1504/IJBRA.2018.092691",

language = "English",

volume = "14",

pages = "235--254",

journal = "International Journal of Bioinformatics Research and Applications",

issn = "1744-5485",

publisher = "Inderscience Enterprises Ltd",

number = "3",

}

TY - JOUR

T1 - Sample-to-sample p-value variability and its implications for multivariate analysis

AU - Wang, Wei

AU - Goh, Wilson Wen Bin

PY - 2018

Y1 - 2018

N2 - Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that pvalues are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.

AB - Statistical feature selection is used for identification of relevant genes from biological data, with implications for biomarker and drug development. Recent work demonstrates that the t-test p-value exhibits high sample-to-sample p-value variability accompanied by an exaggeration of effect size in the univariate scenario. To deepen understanding, we further examined p-value and effect size variability issues across a variety of alternative scenarios. We find that with increased sampling sizes, there is convergence towards true effect size. Moreover, with greater power (stronger effect size or sampling size), p-value variability does not quite converge, suggesting that pvalues are a terrible indicator of estimated effect sizes. The t-test is resilient, and surprisingly effective even in test scenarios where its non-parametric counterpart, the Wilcoxon rank-sum test is expected to better. Since p-values are variable and poorly predict effect size, ranking individual gene or protein features based on p-values is a terrible idea, and we demonstrate that restriction of the top 500 features (ranked based on p-values) in real protein expression data comprising 12 normal and 12 renal cancer patients worsens instability. The use of stability indicators such as the bootstrap, estimated effect size and confidence intervals alongside the p-value is required to make meaningful and statistically valid interpretations.

KW - P-value

KW - Statistical feature selection

KW - T-test

KW - Variability

KW - Wilcoxon rank-sum test

UR - http://www.scopus.com/inward/record.url?scp=85049578255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049578255&partnerID=8YFLogxK

U2 - 10.1504/IJBRA.2018.092691

DO - 10.1504/IJBRA.2018.092691

M3 - Article

AN - SCOPUS:85049578255

SN - 1744-5485

VL - 14

SP - 235

EP - 254

JO - International Journal of Bioinformatics Research and Applications

JF - International Journal of Bioinformatics Research and Applications

IS - 3

ER -

Sample-to-sample p-value variability and its implications for multivariate analysis

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this