6533b7d5fe1ef96bd1264702

RESEARCH PRODUCT

Approximate Matching over Biological RDF Graphs

Simona E. RomboRoberto De Virgilio

subject

Theoretical computer scienceGraph databaseComputer scienceSearch engine indexingcomputer.file_formatcomputer.software_genreGraphBioinformatics network analysisApproximate matchingIsomorphismRDFKEGGHeuristicscomputerBiological networkNetwork analysis

description

In the last few years, the amount of biological interaction data discovered and stored in public databases (e.g., KEGG [2]) considerably increased. To this aim, RDF is a powerful representation for interactions (or pathways), since they can be modeled as directed graphs, often referred to as biological networks, where nodes represent cellular components and the (labeled or unlabeled) edges correspond to interactions among components. Often for a given organism some components are known to be linked by well studied interactions. Such groups of components are called modules and they can be represented by sub-graphs in the corresponding biological network model. At today, one of the most important problems for biologists is that of querying the interaction dataset of an organism by a specific module exploited as the query. The aim is to discover if such a module is contained in the input interaction dataset. In this scenario biological variations (e.g. insertions and/or deletions of both nodes and edges) due to the evolution need to be considered in the search process, thus that approximate matching is more effective than exact matching. Typically the problem is reconducted to the search of all the subgraphs in the biological network (i.e. the input graph database) that are isomorphic to the given module (i.e. the query graph). Since such a test is known to be an NP-hard problem, many proposals introduce heuristics (i.e. similarity or distance measurements) and particular indexing structures to reduce the overall complexity (e.g., [3, 4]).

10.1145/2245276.2232000http://hdl.handle.net/10447/65708