6533b853fe1ef96bd12ad447
RESEARCH PRODUCT
Disease–Genes Must Guide Data Source Integration in the Gene Prioritization Process
Marco NotaroGiorgio ValentiniMiguel A. Andrade-navarroMarco FrascaDario MalchiodiMarco MesitiJean-fred Fontainesubject
0301 basic medicineClass (computer programming)Boosting (machine learning)Computer scienceProcess (engineering)media_common.quotation_subjectComputational biologyScarcity03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyExpression quantitative trait lociKey (cryptography)Feature (machine learning)Gene prioritizationmedia_commondescription
One of the main issues in detecting the genes involved in the etiology of genetic human diseases is the integration of different types of available functional relationships between genes. Numerous approaches exploited the complementary evidence coded in heterogeneous sources of data to prioritize disease-genes, such as functional profiles or expression quantitative trait loci, but none of them to our knowledge posed the scarcity of known disease-genes as a feature of their integration methodology. Nevertheless, in contexts where data are unbalanced, that is, where one class is largely under-represented, imbalance-unaware approaches may suffer a strong decrease in performance. We claim that imbalance-aware integration is a key requirement for boosting performance of gene prioritization (GP) methods. To support our claim, we propose an imbalance-aware integration algorithm for the GP problem, and we compare it on benchmark data with other state-of-the-art integration methodologies.
year | journal | country | edition | language |
---|---|---|---|---|
2019-01-01 |