Search results for "Statistics & Probability"
showing 10 items of 436 documents
Centile estimation for a proportion response variable
2015
This paper introduces two general models for computing centiles when the response variable Y can take values between 0 and 1, inclusive of 0 or 1. The models developed are more flexible alternatives to the beta inflated distribution. The first proposed model employs a flexible four parameter logit skew Student t (logitSST) distribution to model the response variable Y on the unit interval (0, 1), excluding 0 and 1. This model is then extended to the inflated logitSST distribution for Y on the unit interval, including 1. The second model developed in this paper is a generalised Tobit model for Y on the unit interval, including 1. Applying these two models to (1-Y) rather than Y enables model…
Model-Assisted Estimation Through Random Forests in Finite Population Sampling
2021
In surveys, the interest lies in estimating finite population parameters such as population totals and means. In most surveys, some auxiliary information is available at the estimation stage. This information may be incorporated in the estimation procedures to increase their precision. In this article, we use random forests (RFs) to estimate the functional relationship between the survey variable and the auxiliary variables. In recent years, RFs have become attractive as National Statistical Offices have now access to a variety of data sources, potentially exhibiting a large number of observations on a large number of variables. We establish the theoretical properties of model-assisted proc…
Establishing some order amongst exact approximations of MCMCs
2016
Exact approximations of Markov chain Monte Carlo (MCMC) algorithms are a general emerging class of sampling algorithms. One of the main ideas behind exact approximations consists of replacing intractable quantities required to run standard MCMC algorithms, such as the target probability density in a Metropolis-Hastings algorithm, with estimators. Perhaps surprisingly, such approximations lead to powerful algorithms which are exact in the sense that they are guaranteed to have correct limiting distributions. In this paper we discover a general framework which allows one to compare, or order, performance measures of two implementations of such algorithms. In particular, we establish an order …
What we look at in paintings: A comparison between experienced and inexperienced art viewers
2016
How do people look at art? Are there any differences between how experienced and inexperienced art viewers look at a painting? We approach these questions by analyzing and modeling eye movement data from a cognitive art research experiment, where the eye movements of twenty test subjects, ten experienced and ten inexperienced art viewers, were recorded while they were looking at paintings. Eye movements consist of stops of the gaze as well as jumps between the stops. Hence, the observed gaze stop locations can be thought as a spatial point pattern, which can be modeled by a spatio-temporal point process. We introduce some statistical tools to analyze the spatio-temporal eye movement data, a…
Latin hypercube sampling with inequality constraints
2010
International audience; In some studies requiring predictive and CPU-time consuming numerical models, the sampling design of the model input variables has to be chosen with caution. For this purpose, Latin hypercube sampling has a long history and has shown its robustness capabilities. In this paper we propose and discuss a new algorithm to build a Latin hypercube sample (LHS) taking into account inequality constraints between the sampled variables. This technique, called constrained Latin hypercube sampling (cLHS), consists in doing permutations on an initial LHS to honor the desired monotonic constraints. The relevance of this approach is shown on a real example concerning the numerical w…
Bayesian survival analysis with BUGS
2020
Survival analysis is one of the most important fields of statistics in medicine and biological sciences. In addition, the computational advances in the last decades have favored the use of Bayesian methods in this context, providing a flexible and powerful alternative to the traditional frequentist approach. The objective of this article is to summarize some of the most popular Bayesian survival models, such as accelerated failure time, proportional hazards, mixture cure, competing risks, multi-state, frailty, and joint models of longitudinal and survival data. Moreover, an implementation of each presented model is provided using a BUGS syntax that can be run with JAGS from the R programmin…
Multivariate nonparametric estimation of the Pickands dependence function using Bernstein polynomials
2017
Abstract Many applications in risk analysis require the estimation of the dependence among multivariate maxima, especially in environmental sciences. Such dependence can be described by the Pickands dependence function of the underlying extreme-value copula. Here, a nonparametric estimator is constructed as the sample equivalent of a multivariate extension of the madogram. Shape constraints on the family of Pickands dependence functions are taken into account by means of a representation in terms of Bernstein polynomials. The large-sample theory of the estimator is developed and its finite-sample performance is evaluated with a simulation study. The approach is illustrated with a dataset of…
Importance sampling correction versus standard averages of reversible MCMCs in terms of the asymptotic variance
2017
We establish an ordering criterion for the asymptotic variances of two consistent Markov chain Monte Carlo (MCMC) estimators: an importance sampling (IS) estimator, based on an approximate reversible chain and subsequent IS weighting, and a standard MCMC estimator, based on an exact reversible chain. Essentially, we relax the criterion of the Peskun type covariance ordering by considering two different invariant probabilities, and obtain, in place of a strict ordering of asymptotic variances, a bound of the asymptotic variance of IS by that of the direct MCMC. Simple examples show that IS can have arbitrarily better or worse asymptotic variance than Metropolis-Hastings and delayed-acceptanc…
Blind source separation for non-stationary random fields
2022
Regional data analysis is concerned with the analysis and modeling of measurements that are spatially separated by specifically accounting for typical features of such data. Namely, measurements in close proximity tend to be more similar than the ones further separated. This might hold also true for cross-dependencies when multivariate spatial data is considered. Often, scientists are interested in linear transformations of such data which are easy to interpret and might be used as dimension reduction. Recently, for that purpose spatial blind source separation (SBSS) was introduced which assumes that the observed data are formed by a linear mixture of uncorrelated, weakly stationary random …
Bayesian models for data missing not at random in health examination surveys
2018
In epidemiological surveys, data missing not at random (MNAR) due to survey nonresponse may potentially lead to a bias in the risk factor estimates. We propose an approach based on Bayesian data augmentation and survival modelling to reduce the nonresponse bias. The approach requires additional information based on follow-up data. We present a case study of smoking prevalence using FINRISK data collected between 1972 and 2007 with a follow-up to the end of 2012 and compare it to other commonly applied missing at random (MAR) imputation approaches. A simulation experiment is carried out to study the validity of the approaches. Our approach appears to reduce the nonresponse bias substantially…