6533b86efe1ef96bd12cbe30

RESEARCH PRODUCT

Discriminative pattern discovery for the characterization of different network populations

Fabio FassettiSimona E RomboCristina Serrao

subject

Statistics and Probabilitypattern discoveryComputational MathematicsComputational Theory and MathematicsSettore INF/01 - InformaticaMolecular BiologyBiochemistrynetwork populationsComputer Science Applications

description

Abstract Motivation An interesting problem is to study how gene co-expression varies in two different populations, associated with healthy and unhealthy individuals, respectively. To this aim, two important aspects should be taken into account: (i) in some cases, pairs/groups of genes show collaborative attitudes, emerging in the study of disorders and diseases; (ii) information coming from each single individual may be crucial to capture specific details, at the basis of complex cellular mechanisms; therefore, it is important avoiding to miss potentially powerful information, associated with the single samples. Results Here, a novel approach is proposed, such that two different input populations are considered, and represented by two datasets of edge-labeled graphs. Each graph is associated to an individual, and the edge label is the co-expression value between the two genes associated to the nodes. Discriminative patterns among graphs belonging to different sample sets are searched for, based on a statistical notion of ‘relevance’ able to take into account important local similarities, and also collaborative effects, involving the co-expression among multiple genes. Four different gene expression datasets have been analyzed by the proposed approach, each associated to a different disease. An extensive set of experiments show that the extracted patterns significantly characterize important differences between healthy and unhealthy samples, both in the cooperation and in the biological functionality of the involved genes/proteins. Moreover, the provided analysis confirms some results already presented in the literature on genes with a central role for the considered diseases, still allowing to identify novel and useful insights on this aspect. Availability and implementation The algorithm has been implemented using the Java programming language. The data underlying this article and the code are available at https://github.com/CriSe92/DiscriminativeSubgraphDiscovery.

10.1093/bioinformatics/btad168https://hdl.handle.net/10447/594956