Feature selection in clinical proteomics: with great power comes great reproducibility

Wei Wang; Andrew C.H. Sue; Wilson W.B. Goh

doi:10.1016/j.drudis.2016.12.006

Feature selection in clinical proteomics: with great power comes great reproducibility

Wei Wang, Andrew C.H. Sue, Wilson W.B. Goh^*

^*Corresponding author for this work

Research output: Contribution to journal › Review article › peer-review

29 Citations (Scopus)

Abstract

In clinical proteomics, reproducible feature selection is unattainable given the standard statistical hypothesis-testing framework. This leads to irreproducible signatures with no diagnostic power. Instability stems from high P-value variability (p_var), which is inevitable and insolvable. The impact of p_var can be reduced via power increment, for example increasing sample size and measurement accuracy. However, these are not realistic solutions in practice. Instead, workarounds using existing data such as signal boosting transformation techniques and network-based statistical testing is more practical. Furthermore, it is useful to consider other metrics alongside P-values including confidence intervals, effect sizes and cross-validation accuracies to make informed inferences.

Original language	English
Pages (from-to)	912-918
Number of pages	7
Journal	Drug Discovery Today
Volume	22
Issue number	6
DOIs	https://doi.org/10.1016/j.drudis.2016.12.006
Publication status	Published - Jun 2017
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2016 Elsevier Ltd

ASJC Scopus Subject Areas

Pharmacology
Drug Discovery

Access to Document

10.1016/j.drudis.2016.12.006

Cite this

@article{ce47e494740448c9bffc73ffc8fa298f,

title = "Feature selection in clinical proteomics: with great power comes great reproducibility",

abstract = "In clinical proteomics, reproducible feature selection is unattainable given the standard statistical hypothesis-testing framework. This leads to irreproducible signatures with no diagnostic power. Instability stems from high P-value variability (p\_var), which is inevitable and insolvable. The impact of p\_var can be reduced via power increment, for example increasing sample size and measurement accuracy. However, these are not realistic solutions in practice. Instead, workarounds using existing data such as signal boosting transformation techniques and network-based statistical testing is more practical. Furthermore, it is useful to consider other metrics alongside P-values including confidence intervals, effect sizes and cross-validation accuracies to make informed inferences.",

author = "Wei Wang and Sue, \{Andrew C.H.\} and Goh, \{Wilson W.B.\}",

note = "Publisher Copyright: {\textcopyright} 2016 Elsevier Ltd",

year = "2017",

month = jun,

doi = "10.1016/j.drudis.2016.12.006",

language = "English",

volume = "22",

pages = "912--918",

journal = "Drug Discovery Today",

issn = "1359-6446",

publisher = "Elsevier Limited",

number = "6",

}

TY - JOUR

T1 - Feature selection in clinical proteomics

T2 - with great power comes great reproducibility

AU - Wang, Wei

AU - Sue, Andrew C.H.

AU - Goh, Wilson W.B.

PY - 2017/6

Y1 - 2017/6

N2 - In clinical proteomics, reproducible feature selection is unattainable given the standard statistical hypothesis-testing framework. This leads to irreproducible signatures with no diagnostic power. Instability stems from high P-value variability (p_var), which is inevitable and insolvable. The impact of p_var can be reduced via power increment, for example increasing sample size and measurement accuracy. However, these are not realistic solutions in practice. Instead, workarounds using existing data such as signal boosting transformation techniques and network-based statistical testing is more practical. Furthermore, it is useful to consider other metrics alongside P-values including confidence intervals, effect sizes and cross-validation accuracies to make informed inferences.

AB - In clinical proteomics, reproducible feature selection is unattainable given the standard statistical hypothesis-testing framework. This leads to irreproducible signatures with no diagnostic power. Instability stems from high P-value variability (p_var), which is inevitable and insolvable. The impact of p_var can be reduced via power increment, for example increasing sample size and measurement accuracy. However, these are not realistic solutions in practice. Instead, workarounds using existing data such as signal boosting transformation techniques and network-based statistical testing is more practical. Furthermore, it is useful to consider other metrics alongside P-values including confidence intervals, effect sizes and cross-validation accuracies to make informed inferences.

UR - http://www.scopus.com/inward/record.url?scp=85008497328&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008497328&partnerID=8YFLogxK

U2 - 10.1016/j.drudis.2016.12.006

DO - 10.1016/j.drudis.2016.12.006

M3 - Review article

C2 - 27988358

AN - SCOPUS:85008497328

SN - 1359-6446

VL - 22

SP - 912

EP - 918

JO - Drug Discovery Today

JF - Drug Discovery Today

IS - 6

ER -

Feature selection in clinical proteomics: with great power comes great reproducibility

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Fingerprint

Cite this