6533b7dbfe1ef96bd1271598

RESEARCH PRODUCT

Toward a direct and scalable identification of reduced models for categorical processes.

Susanne GerberIllia Horenko

subject

0301 basic medicineMultidisciplinarybusiness.industryComputer scienceDimensionality reductionBayesian inferenceMachine learningcomputer.software_genre01 natural sciencesReduction (complexity)010104 statistics & probability03 medical and health sciencesIdentification (information)030104 developmental biologyPhysical informationPhysical SciencesA priori and a posterioriArtificial intelligenceData mining0101 mathematicsCluster analysisbusinessCategorical variablecomputer

description

The applicability of many computational approaches is dwelling on the identification of reduced models defined on a small set of collective variables (colvars). A methodology for scalable probability-preserving identification of reduced models and colvars directly from the data is derived—not relying on the availability of the full relation matrices at any stage of the resulting algorithm, allowing for a robust quantification of reduced model uncertainty and allowing us to impose a priori available physical information. We show two applications of the methodology: (i) to obtain a reduced dynamical model for a polypeptide dynamics in water and (ii) to identify diagnostic rules from a standard breast cancer dataset. For the first example, we show that the obtained reduced dynamical model can reproduce the full statistics of spatial molecular configurations—opening possibilities for a robust dimension and model reduction in molecular dynamics. For the breast cancer data, this methodology identifies a very simple diagnostics rule—free of any tuning parameters and exhibiting the same performance quality as the state of the art machine-learning applications with multiple tuning parameters reported for this problem.

10.1073/pnas.1612619114https://pubmed.ncbi.nlm.nih.gov/28432182