Comparison of classification methods that combine clinical data and high-dimensional mass spectrometry data

6533b838fe1ef96bd12a3f1a

RESEARCH PRODUCT

Comparison of classification methods that combine clinical data and high-dimensional mass spectrometry data

Hervé Cardot Aline Jeannin Elise Mostacci Caroline Truntzer Patrick Ducoroy Jean-michel Petit

subject

Proteomics Computer science Predictive value Context (language use)computer.software_genre Mass spectrometry Biochemistry Data type High-dimension Lasso (statistics)Structural Biology Humans Molecular Biology Selection (genetic algorithm)Applied Mathematics Dimensionality reduction Classification Data science Computer Science Applications Fatty Liver Identification (information)Sample Size Spectrometry Mass Matrix-Assisted Laser Desorption-Ionization Clinical data Biomarker (medicine)Classification methods Data mining DNA microarray computer Algorithms Biomarkers Research Article

description

Background The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. Technologies like mass spectrometry are commonly being used in proteomic research. Mass spectrometry signals show the proteomic profiles of the individuals under study at a given time. These profiles correspond to the recording of a large number of proteins, much larger than the number of individuals. These variables come in addition to or to complete classical clinical variables. The objective of this study is to evaluate and compare the predictive ability of new and existing models combining mass spectrometry data and classical clinical variables. This study was conducted in the context of binary prediction. Results To achieve this goal, simulated data as well as a real dataset dedicated to the selection of proteomic markers of steatosis were used to evaluate the methods. The proposed methods meet the challenge of high-dimensional data and the selection of predictive markers by using penalization methods (Ridge, Lasso) and dimension reduction techniques (PLS), as well as a combination of both strategies through sparse PLS in the context of a binary class prediction. The methods were compared in terms of mean classification rate and their ability to select the true predictive values. These comparisons were done on clinical-only models, mass-spectrometry-only models and combined models. Conclusions It was shown that models which combine both types of data can be more efficient than models that use only clinical or mass spectrometry data when the sample size of the dataset is large enough.

year	journal	country	edition	language
2013-06-10	BMC Bioinformatics

https://doi.org/10.1186/s12859-014-0385-z