6533b7d7fe1ef96bd12679b2

RESEARCH PRODUCT

Efficient Online Laplacian Eigenmap Computation for Dimensionality Reduction in Molecular Phylogeny via Optimisation on the Sphere

Christophe GuyeuxStéphane Chrétien

subject

0303 health sciences[STAT.AP]Statistics [stat]/Applications [stat.AP]Computer scienceDimensionality reductionComputationDimension (graph theory)Complete graphMinimum spanning treeBayesian inferenceQuantitative Biology::Genomics03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION0302 clinical medicine[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]Algorithm030217 neurology & neurosurgeryEigenvalues and eigenvectorsDistance matrices in phylogenyComputingMilieux_MISCELLANEOUS030304 developmental biology

description

Reconstructing the phylogeny of large groups of large divergent genomes remains a difficult problem to solve, whatever the methods considered. Methods based on distance matrices are blocked due to the calculation of these matrices that is impossible in practice, when Bayesian inference or maximum likelihood methods presuppose multiple alignment of the genomes, which is itself difficult to achieve if precision is required. In this paper, we propose to calculate new distances for randomly selected couples of species over iterations, and then to map the biological sequences in a space of small dimension based on the partial knowledge of this genome similarity matrix. This mapping is then used to obtain a complete graph from which a minimum spanning tree representing the phylogenetic links between species is extracted. This new online Newton method for the computation of eigenvectors that solves the problem of constructing the Laplacian eigenmap for molecular phylogeny is finally applied on a set of more than two thousand complete chloroplasts.

10.1007/978-3-030-17938-0_39https://hal.archives-ouvertes.fr/hal-02515902