0000000000005128

AUTHOR

Stéphane Chrétien

Simulation-based estimation of branching models for LTR retrotransposons

Abstract Motivation LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows us to take into account both the positions and the degradation level of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. Results Various functions have been implemented in order to simulate their spread and visualization tools are proposed. Based on these simulation tools, we have developed a first method to evaluate the parameters of this propagation …

research product

Efficient Online Laplacian Eigenmap Computation for Dimensionality Reduction in Molecular Phylogeny via Optimisation on the Sphere

Reconstructing the phylogeny of large groups of large divergent genomes remains a difficult problem to solve, whatever the methods considered. Methods based on distance matrices are blocked due to the calculation of these matrices that is impossible in practice, when Bayesian inference or maximum likelihood methods presuppose multiple alignment of the genomes, which is itself difficult to achieve if precision is required. In this paper, we propose to calculate new distances for randomly selected couples of species over iterations, and then to map the biological sequences in a space of small dimension based on the partial knowledge of this genome similarity matrix. This mapping is then used …

research product

Average Performance Analysis of the Stochastic Gradient Method for Online PCA

International audience; This paper studies the complexity of the stochastic gradient algorithm for PCA when the data are observed in a streaming setting. We also propose an online approach for selecting the learning rate. Simulation experiments confirm the practical relevance of the plain stochastic gradient approach and that drastic improvements can be achieved by learning the learning rate.

research product

SpCLUST: Towards a fast and reliable clustering for potentially divergent biological sequences

International audience; This paper presents SpCLUST, a new C++ package that takes a list of sequences as input, aligns them with MUSCLE, computes their similarity matrix in parallel and then performs the clustering. SpCLUST extends a previously released software by integrating additional scoring matrices which enables it to cover the clustering of amino-acid sequences. The similarity matrix is now computed in parallel according to the master/slave distributed architecture, using MPI. Performance analysis, realized on two real datasets of 100 nucleotide sequences and 1049 amino-acids ones, show that the resulting library substantially outperforms the original Python package. The proposed pac…

research product

Online shortest paths with confidence intervals for routing in a time varying random network

International audience; The increase in the world's population and rising standards of living is leading to an ever-increasing number of vehicles on the roads, and with it ever-increasing difficulties in traffic management. This traffic management in transport networks can be clearly optimized by using information and communication technologies referred as Intelligent Transport Systems (ITS). This management problem is usually reformulated as finding the shortest path in a time varying random graph. In this article, an online shortest path computation using stochastic gradient descent is proposed. This routing algorithm for ITS traffic management is based on the online Frank-Wolfe approach.…

research product

High-overtone bulk acoustic resonator as passive ground penetrating RADAR cooperative targets

International audience; RAdio-frequency Detection And Ranging instruments—RADARs—are widely used for applications aimed at measuring passive target velocity or ranging for various metrology applications such as ground position and localization. Within the context of using piezoelectric acoustic passive sensors as cooperative targets to RADARs probed through a radiofrequency link, this paper reports on investigating the compatibility of narrowband resonator architectures with the classical operation mode of wideband RADAR instruments. Since single mode resonators are hardly compatible due to the limited bandwidth of their spectrum, the investigation has been extended to High-overtone Bulk Ac…

research product

Dendrochemical assessment of mercury releases from a pond and dredged-sediment landfill impacted by a chlor-alkali plant.

International audience; Although current Hg emissions from industrial activities may be accurately monitored, evidence of past releases to the atmosphere must rely on one or more environmental proxies. We used Hg concentrations in tree cores collected from poplars and willows to investigate the historical changes of Hg emissions from a dredged sediment landfill and compared them to a nearby control location. Our results demonstrated the potential value of using dendrochemistry to record historical Hg emissions from past industrial activities.

research product

Multivariate GARCH estimation via a Bregman-proximal trust-region method

The estimation of multivariate GARCH time series models is a difficult task mainly due to the significant overparameterization exhibited by the problem and usually referred to as the "curse of dimensionality". For example, in the case of the VEC family, the number of parameters involved in the model grows as a polynomial of order four on the dimensionality of the problem. Moreover, these parameters are subjected to convoluted nonlinear constraints necessary to ensure, for instance, the existence of stationary solutions and the positive semidefinite character of the conditional covariance matrices used in the model design. So far, this problem has been addressed in the literature only in low…

research product

Finding optimal finite biological sequences over finite alphabets: the OptiFin toolbox

International audience; In this paper, we present a toolbox for a specific optimization problem that frequently arises in bioinformatics or genomics. In this specific optimisation problem, the state space is a set of words of specified length over a finite alphabet. To each word is associated a score. The overall objective is to find the words which have the lowest possible score. This type of general optimization problem is encountered in e.g 3D conformation optimisation for protein structure prediction, or largest core genes subset discovery based on best supported phylogenetic tree for a set of species. In order to solve this problem, we propose a toolbox that can be easily launched usin…

research product

A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.

International audience; In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clust…

research product