How to do quantile normalization correctly for gene expression data analyses

Yaxing Zhao; Limsoon Wong; Wilson Wen Bin Goh

doi:10.1038/s41598-020-72664-6

How to do quantile normalization correctly for gene expression data analyses

Yaxing Zhao, Limsoon Wong, Wilson Wen Bin Goh^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

109 Citations (Scopus)

Abstract

Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split (“Class-specific”). Via simulations with both real and simulated batch effects, we demonstrate that the “Class-specific” strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the “Class-specific” strategy.

Original language	English
Article number	15534
Journal	Scientific Reports
Volume	10
Issue number	1
DOIs	https://doi.org/10.1038/s41598-020-72664-6
Publication status	Published - Dec 1 2020
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2020, The Author(s).

ASJC Scopus Subject Areas

General

Access to Document

10.1038/s41598-020-72664-6

Cite this

@article{1dd1e7dc30464f08935487ff2a93891a,

title = "How to do quantile normalization correctly for gene expression data analyses",

abstract = "Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split (“Class-specific”). Via simulations with both real and simulated batch effects, we demonstrate that the “Class-specific” strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the “Class-specific” strategy.",

author = "Yaxing Zhao and Limsoon Wong and Goh, \{Wilson Wen Bin\}",

note = "Publisher Copyright: {\textcopyright} 2020, The Author(s).",

year = "2020",

month = dec,

day = "1",

doi = "10.1038/s41598-020-72664-6",

language = "English",

volume = "10",

journal = "Scientific Reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - How to do quantile normalization correctly for gene expression data analyses

AU - Zhao, Yaxing

AU - Wong, Limsoon

AU - Goh, Wilson Wen Bin

PY - 2020/12/1

Y1 - 2020/12/1

N2 - Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split (“Class-specific”). Via simulations with both real and simulated batch effects, we demonstrate that the “Class-specific” strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the “Class-specific” strategy.

AB - Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split (“Class-specific”). Via simulations with both real and simulated batch effects, we demonstrate that the “Class-specific” strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the “Class-specific” strategy.

UR - http://www.scopus.com/inward/record.url?scp=85091387599&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85091387599&partnerID=8YFLogxK

U2 - 10.1038/s41598-020-72664-6

DO - 10.1038/s41598-020-72664-6

M3 - Article

C2 - 32968196

AN - SCOPUS:85091387599

SN - 2045-2322

VL - 10

JO - Scientific Reports

JF - Scientific Reports

IS - 1

M1 - 15534

ER -

How to do quantile normalization correctly for gene expression data analyses

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Fingerprint

Cite this