Dealing with missing values in proteomics data

Weijia Kong; Harvard Wai Hann Hui; Hui Peng; Wilson Wen Bin Goh

doi:10.1002/pmic.202200092

Dealing with missing values in proteomics data

Weijia Kong, Harvard Wai Hann Hui, Hui Peng, Wilson Wen Bin Goh^*

^*Corresponding author for this work

Research output: Contribution to journal › Review article › peer-review

49 Citations (Scopus)

Abstract

Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.

Original language	English
Article number	2200092
Journal	Proteomics
Volume	22
Issue number	23-24
DOIs	https://doi.org/10.1002/pmic.202200092
Publication status	Published - Dec 2022
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2022 Wiley-VCH GmbH.

ASJC Scopus Subject Areas

Biochemistry
Molecular Biology

Keywords

bioinformatics
computational biology
data analysis
missing value
missing value imputation
proteomics
statistics

Access to Document

10.1002/pmic.202200092

Cite this

@article{89f3ede08c4145f2827dbdbf158da6fc,

title = "Dealing with missing values in proteomics data",

abstract = "Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.",

keywords = "bioinformatics, computational biology, data analysis, missing value, missing value imputation, proteomics, statistics",

author = "Weijia Kong and Hui, \{Harvard Wai Hann\} and Hui Peng and Goh, \{Wilson Wen Bin\}",

note = "Publisher Copyright: {\textcopyright} 2022 Wiley-VCH GmbH.",

year = "2022",

month = dec,

doi = "10.1002/pmic.202200092",

language = "English",

volume = "22",

journal = "Proteomics",

issn = "1615-9853",

publisher = "Wiley-VCH Verlag",

number = "23-24",

}

TY - JOUR

T1 - Dealing with missing values in proteomics data

AU - Kong, Weijia

AU - Hui, Harvard Wai Hann

AU - Peng, Hui

AU - Goh, Wilson Wen Bin

PY - 2022/12

Y1 - 2022/12

N2 - Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.

AB - Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.

KW - bioinformatics

KW - computational biology

KW - data analysis

KW - missing value

KW - missing value imputation

KW - proteomics

KW - statistics

UR - http://www.scopus.com/inward/record.url?scp=85142235275&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85142235275&partnerID=8YFLogxK

U2 - 10.1002/pmic.202200092

DO - 10.1002/pmic.202200092

M3 - Review article

C2 - 36349819

AN - SCOPUS:85142235275

SN - 1615-9853

VL - 22

JO - Proteomics

JF - Proteomics

IS - 23-24

M1 - 2200092

ER -

Dealing with missing values in proteomics data

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this