6533b7d9fe1ef96bd126cbae

RESEARCH PRODUCT

Adapted Transfer of Distance Measures for Quantitative Structure-Activity Relationships and Data-Driven Selection of Source Datasets

Ulrich RückertStefan KramerTobias Girschick

subject

General Computer Sciencebusiness.industryComputer scienceFingerprint (computing)Chemical similaritycomputer.software_genreMachine learningDistance measuresData-drivenTask (project management)Similarity (network science)Learning curveData miningArtificial intelligencebusinessTransfer of learningcomputer

description

Quantitative structure–activity relationships are regression models relating chemical structure to biological activity. Such models allow to make predictions for toxicologically relevant endpoints, which constitute the target outcomes of experiments. The task is often tackled by instance-based methods, which are all based on the notion of chemical (dis-)similarity. Our starting point is the observation by Raymond and Willett that the two families of chemical distance measures, fingerprint-based and maximum common subgraph-based measures, provide orthogonal information about chemical similarity. This paper presents a novel method for finding suitable combinations of them, called adapted transfer, which adapts a distance measure learned on another, related dataset to a given dataset. Adapted transfer thus combines distance learning and transfer learning in a novel manner. In our experiments, we visualize the performance of the methods by learning curves and present a quantitative comparison for 10 and 100% of the maximum training set size to show that transfer exploiting source datasets is effective with small training datasets. Additionally, we present an approach to select the source task in a data-driven manner. The relevant experiments include an example that shows that the selection of a meaningful source task is a critical factor for transfer learning.

https://doi.org/10.1093/comjnl/bxs092