Using Deep Learning to Extrapolate Protein Expression Measurements

6533b835fe1ef96bd129f60c

RESEARCH PRODUCT

Using Deep Learning to Extrapolate Protein Expression Measurements

Lelde Lace James C. Wright Fatemeh Zamanzad Ghavidel Jyoti S. Choudhary Inge Jonassen Juan Antonio Vizcaíno Kārlis ČErāns Mārtiņš Opmanis Darta Rituma Mitra Barzine Kārlis Freivalds Edgars Celms Andrew F. Jarnuczak Juris Viksna Alvis Brazma

subject

Proteomics In silico Quantitative proteomics Computational biology Biology Biochemistry protein abundance prediction Mass Spectrometry Protein expression Mice 03 medical and health sciences Deep Learning Abundance (ecology)Animals Molecular Biology Gene Research Articles 030304 developmental biology deep learning networks 0303 health sciences UniProt keywords business.industry Deep learning 030302 biochemistry & molecular biology Proteins RNA Molecular Sequence Annotation Missing data Gene Ontology Artificial intelligence business Research Article

description

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, including human cell lines and human and mouse tissues. This method predicts the protein expression values with average R 2 scores between 0.46 and 0.54, which is significantly better than predictions based on correlations using the RNA expression data alone. Moreover, it is demonstrated that the derived models can be "transferred" across experiments and species. For instance, the model derived from human tissues gave a R 2 = 0.51 when applied to mouse tissue data. It is concluded that protein abundances generated in label-free MS experiments can be computationally predicted using functional annotated attributes and can be used to highlight aberrant protein abundance values.

year	journal	country	edition	language
2020-10-16	PROTEOMICS

https://doi.org/10.1002/pmic.202000009