Why Batch Effects Matter in Omics Data, and How to Avoid Them

Wilson Wen Bin Goh; Wei Wang; Limsoon Wong

doi:10.1016/j.tibtech.2017.02.012

Why Batch Effects Matter in Omics Data, and How to Avoid Them

Wilson Wen Bin Goh^*, Wei Wang, Limsoon Wong

^*Corresponding author for this work

Research output: Contribution to journal › Review article › peer-review

275 Citations (Scopus)

Abstract

Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.

Original language	English
Pages (from-to)	498-507
Number of pages	10
Journal	Trends in Biotechnology
Volume	35
Issue number	6
DOIs	https://doi.org/10.1016/j.tibtech.2017.02.012
Publication status	Published - Jun 2017
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2017 Elsevier Ltd

ASJC Scopus Subject Areas

Biotechnology
Bioengineering

Keywords

batch effect
cross-validation
data integration
heterogeneity
reproducibility

Access to Document

10.1016/j.tibtech.2017.02.012

Cite this

@article{785544181c4649e5865a549af3051458,

title = "Why Batch Effects Matter in Omics Data, and How to Avoid Them",

abstract = "Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.",

keywords = "batch effect, cross-validation, data integration, heterogeneity, reproducibility",

author = "Goh, \{Wilson Wen Bin\} and Wei Wang and Limsoon Wong",

note = "Publisher Copyright: {\textcopyright} 2017 Elsevier Ltd",

year = "2017",

month = jun,

doi = "10.1016/j.tibtech.2017.02.012",

language = "English",

volume = "35",

pages = "498--507",

journal = "Trends in Biotechnology",

issn = "0167-7799",

publisher = "Elsevier Limited",

number = "6",

}

TY - JOUR

T1 - Why Batch Effects Matter in Omics Data, and How to Avoid Them

AU - Goh, Wilson Wen Bin

AU - Wang, Wei

AU - Wong, Limsoon

PY - 2017/6

Y1 - 2017/6

N2 - Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.

AB - Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.

KW - batch effect

KW - cross-validation

KW - data integration

KW - heterogeneity

KW - reproducibility

UR - http://www.scopus.com/inward/record.url?scp=85016060109&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016060109&partnerID=8YFLogxK

U2 - 10.1016/j.tibtech.2017.02.012

DO - 10.1016/j.tibtech.2017.02.012

M3 - Review article

C2 - 28351613

AN - SCOPUS:85016060109

SN - 0167-7799

VL - 35

SP - 498

EP - 507

JO - Trends in Biotechnology

JF - Trends in Biotechnology

IS - 6

ER -

Why Batch Effects Matter in Omics Data, and How to Avoid Them

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this