Abstract
The Anna Karenina effect is a manifestation of the theory–practice gap that exists when theoretical statistics are applied on real-world data. In the course of analyzing biological data for differential features such as genes or proteins, it derives from the situation where the null hypothesis is rejected for extraneous reasons (or confounders), rather than because the alternative hypothesis is relevant to the disease phenotype. The mechanics of applying statistical tests therefore must address and resolve confounders. It is inadequate to simply rely on manipulating the P-value. We discuss three mechanistic elements (hypothesis statement construction, null distribution appropriateness, and test-statistic construction) and suggest how they can be designed to foil the Anna Karenina effect to select phenotypically relevant biological features.
Original language | English |
---|---|
Pages (from-to) | 488-498 |
Number of pages | 11 |
Journal | Trends in Biotechnology |
Volume | 36 |
Issue number | 5 |
DOIs | |
Publication status | Published - May 2018 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2018 Elsevier Ltd
ASJC Scopus Subject Areas
- Biotechnology
- Bioengineering
Keywords
- biomarker
- feature selection
- generalizability
- reproducibility
- Statistics