0000000000729560
AUTHOR
Matti Vihola
On the stability and ergodicity of adaptive scaling Metropolis algorithms
The stability and ergodicity properties of two adaptive random walk Metropolis algorithms are considered. The both algorithms adjust the scaling of the proposal distribution continuously based on the observed acceptance probability. Unlike the previously proposed forms of the algorithms, the adapted scaling parameter is not constrained within a predefined compact interval. The first algorithm is based on scale adaptation only, while the second one incorporates also covariance adaptation. A strong law of large numbers is shown to hold assuming that the target density is smooth enough and has either compact support or super-exponentially decaying tails.
Identifying territories using presence-only citizen science data : An application to the Finnish wolf population
Citizens, community groups and local institutions participate in voluntary biological monitoring of population status and trends by providing species data e.g. for regulations and conservation. Sophisticated statistical methods are required to unlock the potential of such data in the assessment of wildlife populations. We develop a statistical modelling framework for identifying territories based on presence-only citizen science data. The framework can be used to jointly estimate the number of active animal territories and their locations in time. Our approach is based on a data generating model which consists of a dynamic submodel for the appearance/removal of territories and an observatio…
On the convergence of unconstrained adaptive Markov chain Monte Carlo algorithms
Grapham: Graphical models with adaptive random walk Metropolis algorithms
Recently developed adaptive Markov chain Monte Carlo (MCMC) methods have been applied successfully to many problems in Bayesian statistics. Grapham is a new open source implementation covering several such methods, with emphasis on graphical models for directed acyclic graphs. The implemented algorithms include the seminal Adaptive Metropolis algorithm adjusting the proposal covariance according to the history of the chain and a Metropolis algorithm adjusting the proposal scale based on the observed acceptance probability. Different variants of the algorithms allow one, for example, to use these two algorithms together, employ delayed rejection and adjust several parameters of the algorithm…
On resampling schemes for particle filters with weakly informative observations
We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are indicative about weakly informative settings beyond this context too. We study the performance of different resampling schemes, such as systematic resampling, SSP (Srinivasan sampling process) and stratified resampling, as the time-discretisation becomes finer and also identify their continuous-time l…
QUANTITATIVE CONVERGENCE RATES FOR SUBGEOMETRIC MARKOV CHAINS
We provide explicit expressions for the constants involved in the characterisation of ergodicity of subgeometric Markov chains. The constants are determined in terms of those appearing in the assumed drift and one-step minorisation conditions. The results are fundamental for the study of some algorithms where uniform bounds for these constants are needed for a family of Markov kernels. Our results accommodate also some classes of inhomogeneous chains.
Stochastic order characterization of uniform integrability and tightness
We show that a family of random variables is uniformly integrable if and only if it is stochastically bounded in the increasing convex order by an integrable random variable. This result is complemented by proving analogous statements for the strong stochastic order and for power-integrable dominating random variables. Especially, we show that whenever a family of random variables is stochastically bounded by a p-integrable random variable for some p>1, there is no distinction between the strong order and the increasing convex order. These results also yield new characterizations of relative compactness in Wasserstein and Prohorov metrics.
Hierarchical log Gaussian Cox process for regeneration in uneven-aged forests
We propose a hierarchical log Gaussian Cox process (LGCP) for point patterns, where a set of points x affects another set of points y but not vice versa. We use the model to investigate the effect of large trees to the locations of seedlings. In the model, every point in x has a parametric influence kernel or signal, which together form an influence field. Conditionally on the parameters, the influence field acts as a spatial covariate in the intensity of the model, and the intensity itself is a non-linear function of the parameters. Points outside the observation window may affect the influence field inside the window. We propose an edge correction to account for this missing data. The par…
Establishing some order amongst exact approximations of MCMCs
Exact approximations of Markov chain Monte Carlo (MCMC) algorithms are a general emerging class of sampling algorithms. One of the main ideas behind exact approximations consists of replacing intractable quantities required to run standard MCMC algorithms, such as the target probability density in a Metropolis-Hastings algorithm, with estimators. Perhaps surprisingly, such approximations lead to powerful algorithms which are exact in the sense that they are guaranteed to have correct limiting distributions. In this paper we discover a general framework which allows one to compare, or order, performance measures of two implementations of such algorithms. In particular, we establish an order …
An Adaptive Parallel Tempering Algorithm
Parallel tempering is a generic Markov chainMonteCarlo samplingmethod which allows good mixing with multimodal target distributions, where conventionalMetropolis- Hastings algorithms often fail. The mixing properties of the sampler depend strongly on the choice of tuning parameters, such as the temperature schedule and the proposal distribution used for local exploration. We propose an adaptive algorithm with fixed number of temperatures which tunes both the temperature schedule and the parameters of the random-walk Metropolis kernel automatically. We prove the convergence of the adaptation and a strong law of large numbers for the algorithm under general conditions. We also prove as a side…
Convergence of Markovian Stochastic Approximation with discontinuous dynamics
This paper is devoted to the convergence analysis of stochastic approximation algorithms of the form $\theta_{n+1} = \theta_n + \gamma_{n+1} H_{\theta_n}({X_{n+1}})$, where ${\left\{ {\theta}_n, n \in {\mathbb{N}} \right\}}$ is an ${\mathbb{R}}^d$-valued sequence, ${\left\{ {\gamma}_n, n \in {\mathbb{N}} \right\}}$ is a deterministic stepsize sequence, and ${\left\{ {X}_n, n \in {\mathbb{N}} \right\}}$ is a controlled Markov chain. We study the convergence under weak assumptions on smoothness-in-$\theta$ of the function $\theta \mapsto H_{\theta}({x})$. It is usually assumed that this function is continuous for any $x$; in this work, we relax this condition. Our results are illustrated by c…
Coupled conditional backward sampling particle filter
The conditional particle filter (CPF) is a promising algorithm for general hidden Markov model smoothing. Empirical evidence suggests that the variant of CPF with backward sampling (CBPF) performs well even with long time series. Previous theoretical results have not been able to demonstrate the improvement brought by backward sampling, whereas we provide rates showing that CBPF can remain effective with a fixed number of particles independent of the time horizon. Our result is based on analysis of a new coupling of two CBPFs, the coupled conditional backward sampling particle filter (CCBPF). We show that CCBPF has good stability properties in the sense that with fixed number of particles, …
Importance sampling correction versus standard averages of reversible MCMCs in terms of the asymptotic variance
We establish an ordering criterion for the asymptotic variances of two consistent Markov chain Monte Carlo (MCMC) estimators: an importance sampling (IS) estimator, based on an approximate reversible chain and subsequent IS weighting, and a standard MCMC estimator, based on an exact reversible chain. Essentially, we relax the criterion of the Peskun type covariance ordering by considering two different invariant probabilities, and obtain, in place of a strict ordering of asymptotic variances, a bound of the asymptotic variance of IS by that of the direct MCMC. Simple examples show that IS can have arbitrarily better or worse asymptotic variance than Metropolis-Hastings and delayed-acceptanc…
Conditional particle filters with bridge backward sampling
Conditional particle filters (CPFs) with backward/ancestor sampling are powerful methods for sampling from the posterior distribution of the latent states of a dynamic model such as a hidden Markov model. However, the performance of these methods deteriorates with models involving weakly informative observations and/or slowly mixing dynamics. Both of these complications arise when sampling finely time-discretised continuous-time path integral models, but can occur with hidden Markov models too. Multinomial resampling, which is commonly employed with CPFs, resamples excessively for weakly informative observations and thereby introduces extra variance. Furthermore, slowly mixing dynamics rend…
Conditional particle filters with diffuse initial distributions
Conditional particle filters (CPFs) are powerful smoothing algorithms for general nonlinear/non-Gaussian hidden Markov models. However, CPFs can be inefficient or difficult to apply with diffuse initial distributions, which are common in statistical applications. We propose a simple but generally applicable auxiliary variable method, which can be used together with the CPF in order to perform efficient inference with diffuse initial distributions. The method only requires simulatable Markov transitions that are reversible with respect to the initial distribution, which can be improper. We focus in particular on random-walk type transitions which are reversible with respect to a uniform init…
Can the Adaptive Metropolis Algorithm Collapse Without the Covariance Lower Bound?
The Adaptive Metropolis (AM) algorithm is based on the symmetric random-walk Metropolis algorithm. The proposal distribution has the following time-dependent covariance matrix at step $n+1$ \[ S_n = Cov(X_1,...,X_n) + \epsilon I, \] that is, the sample covariance matrix of the history of the chain plus a (small) constant $\epsilon>0$ multiple of the identity matrix $I$. The lower bound on the eigenvalues of $S_n$ induced by the factor $\epsilon I$ is theoretically convenient, but practically cumbersome, as a good value for the parameter $\epsilon$ may not always be easy to choose. This article considers variants of the AM algorithm that do not explicitly bound the eigenvalues of $S_n$ away …
On the stability of some controlled Markov chains and its applications to stochastic approximation with Markovian dynamic
We develop a practical approach to establish the stability, that is, the recurrence in a given set, of a large class of controlled Markov chains. These processes arise in various areas of applied science and encompass important numerical methods. We show in particular how individual Lyapunov functions and associated drift conditions for the parametrized family of Markov transition probabilities and the parameter update can be combined to form Lyapunov functions for the joint process, leading to the proof of the desired stability property. Of particular interest is the fact that the approach applies even in situations where the two components of the process present a time-scale separation, w…
Theoretical and methodological aspects of MCMC computations with noisy likelihoods
Approximate Bayesian computation (ABC) [11, 42] is a popular method for Bayesian inference involving an intractable, or expensive to evaluate, likelihood function but where simulation from the model is easy. The method consists of defining an alternative likelihood function which is also in general intractable but naturally lends itself to pseudo-marginal computations [5], hence making the approach of practical interest. The aim of this chapter is to show the connections of ABC Markov chain Monte Carlo with pseudo-marginal algorithms, review their existing theoretical results, and discuss how these can inform practice and hopefully lead to fruitful methodological developments. peerReviewed
Conditional convex orders and measurable martingale couplings
Strassen's classical martingale coupling theorem states that two real-valued random variables are ordered in the convex (resp.\ increasing convex) stochastic order if and only if they admit a martingale (resp.\ submartingale) coupling. By analyzing topological properties of spaces of probability measures equipped with a Wasserstein metric and applying a measurable selection theorem, we prove a conditional version of this result for real-valued random variables conditioned on a random element taking values in a general measurable space. We also provide an analogue of the conditional martingale coupling theorem in the language of probability kernels and illustrate how this result can be appli…
Unbiased Inference for Discretely Observed Hidden Markov Model Diffusions
We develop a Bayesian inference method for diffusions observed discretely and with noise, which is free of discretisation bias. Unlike existing unbiased inference methods, our method does not rely on exact simulation techniques. Instead, our method uses standard time-discretised approximations of diffusions, such as the Euler--Maruyama scheme. Our approach is based on particle marginal Metropolis--Hastings, a particle filter, randomised multilevel Monte Carlo, and importance sampling type correction of approximate Markov chain Monte Carlo. The resulting estimator leads to inference without a bias from the time-discretisation as the number of Markov chain iterations increases. We give conver…
On the use of approximate Bayesian computation Markov chain Monte Carlo with inflated tolerance and post-correction
Approximate Bayesian computation allows for inference of complicated probabilistic models with intractable likelihoods using model simulations. The Markov chain Monte Carlo implementation of approximate Bayesian computation is often sensitive to the tolerance parameter: low tolerance leads to poor mixing and large tolerance entails excess bias. We consider an approach using a relatively large tolerance for the Markov chain Monte Carlo sampler to ensure its sufficient mixing, and post-processing the output leading to estimators for a range of finer tolerances. We introduce an approximate confidence interval for the related post-corrected estimators, and propose an adaptive approximate Bayesi…
Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo
We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyperparameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The IS approach provides a natural alternative to delayed acceptance (DA) pseudo-marginal/particle MCMC, and has many advantages over DA, including a straightforward parallelisation and additional flexibility in MCMC implementation. We detail minimal conditions which ensure strong consistency of the sug…
bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R
We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package accommodates also discretely observed latent diffusion processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias. The package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package offers …
Uniform ergodicity of the iterated conditional SMC and geometric ergodicity of particle Gibbs samplers
We establish quantitative bounds for rates of convergence and asymptotic variances for iterated conditional sequential Monte Carlo (i-cSMC) Markov chains and associated particle Gibbs samplers. Our main findings are that the essential boundedness of potential functions associated with the i-cSMC algorithm provide necessary and sufficient conditions for the uniform ergodicity of the i-cSMC Markov chain, as well as quantitative bounds on its (uniformly geometric) rate of convergence. Furthermore, we show that the i-cSMC Markov chain cannot even be geometrically ergodic if this essential boundedness does not hold in many applications of interest. Our sufficiency and quantitative bounds rely on…
Prediction of leukocyte counts during paediatric acute lymphoblastic leukaemia maintenance therapy
Maintenance chemotherapy with oral 6-mercaptopurine and methotrexate remains a cornerstone of modern therapy for acute lymphoblastic leukaemia. The dosage and intensity of therapy are based on surrogate markers such as peripheral blood leukocyte and neutrophil counts. Dosage based leukocyte count predictions could provide support for dosage decisions clinicians face trying to find and maintain an appropriate dosage for the individual patient. We present two Bayesian nonlinear state space models for predicting patient leukocyte counts during the maintenance therapy. The models simplify some aspects of previously proposed models but allow for some extra flexibility. Our second model is an ext…
Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo
We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyperparameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The IS approach provides a natural alternative to delayed acceptance (DA) pseudo-marginal/particle MCMC, and has many advantages over DA, including a straightforward parallelisation and additional flexibility in MCMC implementation. We detail minimal conditions which ensure strong consistency of the sug…
Unbiased Estimators and Multilevel Monte Carlo
Multilevel Monte Carlo (MLMC) and unbiased estimators recently proposed by McLeish (Monte Carlo Methods Appl., 2011) and Rhee and Glynn (Oper. Res., 2015) are closely related. This connection is elaborated by presenting a new general class of unbiased estimators, which admits previous debiasing schemes as special cases. New lower variance estimators are proposed, which are stratified versions of earlier unbiased schemes. Under general conditions, essentially when MLMC admits the canonical square root Monte Carlo error rate, the proposed new schemes are shown to be asymptotically as efficient as MLMC, both in terms of variance and cost. The experiments demonstrate that the variance reduction…
Adaptive Metropolis algorithm using variational Bayesian adaptive Kalman filter
Markov chain Monte Carlo (MCMC) methods are powerful computational tools for analysis of complex statistical problems. However, their computational efficiency is highly dependent on the chosen proposal distribution, which is generally difficult to find. One way to solve this problem is to use adaptive MCMC algorithms which automatically tune the statistics of a proposal distribution during the MCMC run. A new adaptive MCMC algorithm, called the variational Bayesian adaptive Metropolis (VBAM) algorithm, is developed. The VBAM algorithm updates the proposal covariance matrix using the variational Bayesian adaptive Kalman filter (VB-AKF). A strong law of large numbers for the VBAM algorithm is…
Can the adaptive Metropolis algorithm collapse without the covariance lower bound?
The Adaptive Metropolis (AM) algorithm is based on the symmetric random-walk Metropolis algorithm. The proposal distribution has the following time-dependent covariance matrix at step $n+1$ \[ S_n = Cov(X_1,...,X_n) + \epsilon I, \] that is, the sample covariance matrix of the history of the chain plus a (small) constant $\epsilon>0$ multiple of the identity matrix $I$. The lower bound on the eigenvalues of $S_n$ induced by the factor $\epsilon I$ is theoretically convenient, but practically cumbersome, as a good value for the parameter $\epsilon$ may not always be easy to choose. This article considers variants of the AM algorithm that do not explicitly bound the eigenvalues of $S_n$ away …