0000000001192747
AUTHOR
Peggy Cénac
Persistent random walks
We consider a walker that at each step keeps the same direction with a probabilitythat depends on the time already spent in the direction the walker is currently moving. In this paper, we study some asymptotic properties of this persistent random walk and give the conditions of recurrence or transience in terms of "transition" probabilities to keep on the same direction or to change, without assuming that the latter admits any stationary probability. Examples are exhibited when this process is recurrent even if the random walk is not symmetric.
Persistent random walks, variable length Markov chains and piecewise deterministic Markov processes *
A classical random walk $(S_t, t\in\mathbb{N})$ is defined by $S_t:=\displaystyle\sum_{n=0}^t X_n$, where $(X_n)$ are i.i.d. When the increments $(X_n)_{n\in\mathbb{N}}$ are a one-order Markov chain, a short memory is introduced in the dynamics of $(S_t)$. This so-called "persistent" random walk is nolonger Markovian and, under suitable conditions, the rescaled process converges towards the integrated telegraph noise (ITN) as the time-scale and space-scale parameters tend to zero (see Herrmann and Vallois, 2010; Tapiero-Vallois, Tapiero-Vallois2}). The ITN process is effectively non-Markovian too. The aim is to consider persistent random walks $(S_t)$ whose increments are Markov chains with…
A fast and recursive algorithm for clustering large datasets with k-medians
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…
Recursive estimation of the conditional geometric median in Hilbert spaces
International audience; A recursive estimator of the conditional geometric median in Hilbert spaces is studied. It is based on a stochastic gradient algorithm whose aim is to minimize a weighted L1 criterion and is consequently well adapted for robust online estimation. The weights are controlled by a kernel function and an associated bandwidth. Almost sure convergence and L2 rates of convergence are proved under general conditions on the conditional distribution as well as the sequence of descent steps of the algorithm and the sequence of bandwidths. Asymptotic normality is also proved for the averaged version of the algorithm with an optimal rate of convergence. A simulation study confirm…
Probability and algorithmics: a focus on some recent developments
Jean-François Coeurjolly, Adeline Leclercq-Samson Eds.; International audience; This article presents different recent theoretical results illustrating the interactions between probability and algorithmics. These contributions deal with various topics: cellular automata and calculability, variable length Markov chains and persistent random walks, perfect sampling via coupling from the past. All of them involve discrete dynamics on complex random structures.; Cet article présente différents résultats récents de nature théorique illustrant les interactions entre probabilités et algorithmique. Ces contributions traitent de sujets variés : automates cellulaires et calculabilité, chaînes de Mark…
Context Trees, Variable Length Markov Chains and Dynamical Sources
Infinite random sequences of letters can be viewed as stochastic chains or as strings produced by a source, in the sense of information theory. The relationship between Variable Length Markov Chains (VLMC) and probabilistic dynamical sources is studied. We establish a probabilistic frame for context trees and VLMC and we prove that any VLMC is a dynamical source for which we explicitly build the mapping. On two examples, the "comb" and the "bamboo blossom", we find a necessary and sufficient condition for the existence and the uniqueness of a stationary probability measure for the VLMC. These two examples are detailed in order to provide the associated Dirichlet series as well as the genera…
Enseignement et recherche sont inséparables
Les politiques publiques françaises concentrent les moyens de recherche sur quelques “sites”, aux dépens de régions entières, creusant les inégalités entre universités dites “d’élite” ou “de masse”. Mais de nombreux travaux empiriques démontrent l’inefficacité d’une telle concentration des moyens.
Recursion at the crossroads of sequence modeling, random trees, stochastic algorithms and martingales
This monograph synthesizes several studies spanning from dynamical systems in the statistical analysis of sequences, to analysis of algorithms in random trees and discrete stochastic processes. These works find applications in various fields ranging from biological sequences to linear regression models, branching processes, through functional statistics and estimates of risk indicators for insurances. All the established results use, in one way or another, the recursive property of the structure under study, by highlighting invariants such as martingales, which are at the heart of this monograph, as tools as well as objects of study.
Stochastic Approximation for Multivariate and Functional Median
We propose a very simple algorithm in order to estimate the geometric median, also called spatial median, of multivariate (Small (1990)) or functional data (Gervini (2008)) when the sample size is large. A simple and fast iterative approach based on the Robbins-Monro algorithm (Duflo (1997)) as well as its averaged version (Polyak and Juditsky (1992)) are shown to be effective for large samples of high dimension data. They are very fast and only require O(Nd) elementary operations, where N is the sample size and d is the dimension of data. The averaged approach is shown to be more effective and less sensitive to the tuning parameter. The ability of this new estimator to estimate accurately …
Risk indicators with several lines of business: comparison, asymptotic behavior and applications to optimal reserve allocation
International audience; In a multi-dimensional risk model with dependent lines of business, we propose to allocate capital with respect to the minimization of some risk indicators. These indicators are sums of expected penalties due to the insolvency of a branch while the global reserve is either positive or negative. Explicit formulas in the case of two branches are obtained for several models independent exponential, correlated Pareto). The asymptotic behavior (as the initial capital goes to infinity) is studied. For higher dimension and several periods, no explicit expression is available. Using a stochastic algorithm, we get estimations of the allocation, compare the different allocatio…
Variable Length Memory Chains: Characterization of stationary probability measures
Variable Length Memory Chains (VLMC), which are generalizations of finite order Markov chains, turn out to be an essential tool to modelize random sequences in many domains, as well as an interesting object in contemporary probability theory. The question of the existence of stationary probability measures leads us to introduce a key combinatorial structure for words produced by a VLMC: the Longest Internal Suffix. This notion allows us to state a necessary and sufficient condition for a general VLMC to admit a unique invariant probability measure. This condition turns out to get a much simpler form for a subclass of VLMC: the stable VLMC. This natural subclass, unlike the general case, enj…
La « politique de site » et le lien enseignement-recherche
Nous présentons ici des travaux empiriques qui invitent à la critique vis-à-vis de la tentation, présente en particulier au CNRS, de concentration des forces dans un plus petit nombre d'unités et de sites, et à la tendance de privilégier un profil unique de chercheur.se.s et de marginaliser certain.es enseignant.e.s-chercheur.s.es dans les laboratoires. Des travaux empiriques ont notamment montré que la concentration des moyens humains et financiers dans de gros centres est contre-productive et a déjà affiché ses limites et ses dangers. Au contraire, une plus grande homogénéité dans la distribution des fonds y est suggérée, pour conduire à une recherche plus fertile. En France, la transform…
Variable Length Markov Chains, Persistent Random Walks: a close encounter
This is the story of the encounter between two worlds: the world of random walks and the world of Variable Length Markov Chains (VLMC). The meeting point turns around the semi-Markov property of underlying processes.
Uncommon Suffix Tries
Common assumptions on the source producing the words inserted in a suffix trie with $n$ leaves lead to a $\log n$ height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of $n$ and another one whose saturation level is negligible with respect to $\log n$. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources; they are easily extended to families of sources having the same properties. The first example corresponds to a ''logarithmic infinite comb'' and enjoys a non uniform polynomial mixing. The second one corresponds to a ''factorial infinite comb'' for which mixing is uniform and exponential.
Characterization of stationary probability measures for Variable Length Markov Chains
By introducing a key combinatorial structure for words produced by a Variable Length Markov Chain (VLMC), the longest internal suffix, precise characterizations of existence and uniqueness of a stationary probability measure for a VLMC chain are given. These characterizations turn into necessary and sufficient conditions for VLMC associated to a subclass of probabilised context trees: the shift-stable context trees. As a by-product, we prove that a VLMC chain whose stabilized context tree is again a context tree has at most one stationary probability measure.