Graph Rewriting Based Search for Molecular Structures: Definitions, Algorithms, Hardness
We define a graph rewriting system that is easily understandable by humans, but rich enough to allow very general queries to molecule databases. It is based on the substitution of a single node in a node- and edge-labeled graph by an arbitrary graph, explicitly assigning new endpoints to the edges incident to the replaced node. For these graph rewriting systems, we are interested in the subgraph-matching problem. We show that the problem is NP-complete, even on graphs that are stars. As a positive result, we give an algorithm which is polynomial if both rules and query graph have bounded degree and bounded cut size. We demonstrate that molecular graphs of practically relevant molecules in d…
Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars
We consider the following problem: A researcher identified a small number of molecules with a certain property of interest and now wants to find further molecules sharing this property in a database. This can be described as learning molecular classes from small numbers of positive examples. In this work, we propose a method that is based on learning a graph grammar for the molecular class. We consider the type of graph grammars proposed by Althaus et al. [2], as it can be easily interpreted and allows relatively efficient queries. We identify rules that are frequently encountered in the positive examples and use these to construct a graph grammar. We then classify a molecule as being conta…