Abstract
Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics.We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectramatching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolvingmissing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a bruteforce strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal frommultiple peptides ismixed, getting good PSMs is difficult.We consider two strategies: simplification of DIA spectra to pseudo-datadependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problemarises when proteins are never (or inconsistently) detected by MS.When observed in at least one sample, imputationmethods can be used to guess the approximate protein expression level. If never observed at all, network/protein complexbased contextualization provides an independent prediction platform. Data heterogeneity is a difficult problemwith two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldomsufficient, while batch effect-correction algorithmsmay create errors. Batch effect-resistant normalizationmethods are a viable alternative. Finally, SFS is vital for practical applications.Whilemany methods exist, there is no bestmethod, and both upstream(e.g. normalization) and downstreamprocessing (e.g.multipletesting correction) are performance confounders. We also discuss signal detection when class effects are weak.
Original language | English |
---|---|
Pages (from-to) | 347-355 |
Number of pages | 9 |
Journal | Briefings in Bioinformatics |
Volume | 20 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 18 2019 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2017 The Author.
ASJC Scopus Subject Areas
- Information Systems
- Molecular Biology
Keywords
- bioinformatics
- biostatistics
- biotechnology
- networks
- proteomics