Search results for "Statistics & Probability"
showing 10 items of 436 documents
Register data in sample allocations for small-area estimation
2018
The inadequate control of sample sizes in surveys using stratified sampling and area estimation may occur when the overall sample size is small or auxiliary information is insufficiently used. Very small sample sizes are possible for some areas. The proposed allocation based on multi-objective optimization uses a small-area model and estimation method and semi-collected empirical data annually collected empirical data. The assessment of its performance at the area and at the population levels is based on design-based sample simulations. Five previously developed allocations serve as references. The model-based estimator is more accurate than the design-based Horvitz–Thompson estimator and t…
Adaptive Population Importance Samplers: A General Perspective
2016
Importance sampling (IS) is a well-known Monte Carlo method, widely used to approximate a distribution of interest using a random measure composed of a set of weighted samples generated from another proposal density. Since the performance of the algorithm depends on the mismatch between the target and the proposal densities, a set of proposals is often iteratively adapted in order to reduce the variance of the resulting estimator. In this paper, we review several well-known adaptive population importance samplers, providing a unified common framework and classifying them according to the nature of their estimation and adaptive procedures. Furthermore, we interpret the underlying motivation …
Group Metropolis Sampling
2017
Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. Two well-known class of MC methods are the Importance Sampling (IS) techniques and the Markov Chain Monte Carlo (MCMC) algorithms. In this work, we introduce the Group Importance Sampling (GIS) framework where different sets of weighted samples are properly summarized with one summary particle and one summary weight. GIS facilitates the design of novel efficient MC techniques. For instance, we present the Group Metropolis Sampling (GMS) algorithm which produces a Markov chain of sets of weighted samples. GMS in general outperforms other multiple try schemes…
Recycling Gibbs sampling
2017
Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning and statistics. The key point for the successful application of the Gibbs sampler is the ability to draw samples from the full-conditional probability density functions efficiently. In the general case this is not possible, so in order to speed up the convergence of the chain, it is required to generate auxiliary samples. However, such intermediate information is finally disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. Theoretical and exhaustive numerical co…
CovSel
2018
Ensemble methods combine the predictions of a set of models to reach a better prediction quality compared to a single model's prediction. The ensemble process consists of three steps: 1) the generation phase where the models are created, 2) the selection phase where a set of possible ensembles is composed and one is selected by a selection method, 3) the fusion phase where the individual models' predictions of the selected ensemble are combined to an ensemble's estimate. This paper proposes CovSel, a selection approach for regression problems that ranks ensembles based on the coverage of adequately estimated training points and selects the ensemble with the highest coverage to be used in th…
Efficient anomaly detection on sampled data streams with contaminated phase I data
2020
International audience; Control chart algorithms aim to monitor a process over time. This process consists of two phases. Phase I, also called the learning phase, estimates the normal process parameters, then in Phase II, anomalies are detected. However, the learning phase itself can contain contaminated data such as outliers. If left undetected, they can jeopardize the accuracy of the whole chart by affecting the computed parameters, which leads to faulty classifications and defective data analysis results. This problem becomes more severe when the analysis is done on a sample of the data rather than the whole data. To avoid such a situation, Phase I quality must be guaranteed. The purpose…
Convergence of Markovian Stochastic Approximation with discontinuous dynamics
2016
This paper is devoted to the convergence analysis of stochastic approximation algorithms of the form $\theta_{n+1} = \theta_n + \gamma_{n+1} H_{\theta_n}({X_{n+1}})$, where ${\left\{ {\theta}_n, n \in {\mathbb{N}} \right\}}$ is an ${\mathbb{R}}^d$-valued sequence, ${\left\{ {\gamma}_n, n \in {\mathbb{N}} \right\}}$ is a deterministic stepsize sequence, and ${\left\{ {X}_n, n \in {\mathbb{N}} \right\}}$ is a controlled Markov chain. We study the convergence under weak assumptions on smoothness-in-$\theta$ of the function $\theta \mapsto H_{\theta}({x})$. It is usually assumed that this function is continuous for any $x$; in this work, we relax this condition. Our results are illustrated by c…
Probabilistic interpretation of the Calderón problem
2017
In this paper, we use the theory of symmetric Dirichlet forms to give a probabilistic interpretation of Calderon's inverse conductivity problem in terms of reflecting diffusion processes and their corresponding boundary trace processes. This probabilistic interpretation comes in three equivalent formulations which open up novel perspectives on the classical question of unique determinability of conductivities from boundary data. We aim to make this work accessible to both readers with a background in stochastic process theory as well as researchers working on deterministic methods in inverse problems.
A PCA-based clustering algorithm for the identification of stratiform and convective precipitation at the event scale: an application to the sub-hour…
2021
AbstractUnderstanding the structure of precipitation and its separation into stratiform and convective components is still today one of the important and interesting challenges for the scientific community. Despite this interest and the advances made in this field, the classification of rainfall into convective and stratiform components is still today not trivial. This study applies a novel criterion based on a clustering approach to analyze a high temporal resolution precipitation dataset collected for the period 2002–2018 over the Sicily (Italy). Starting from the rainfall events obtained from this dataset, the developed methodology makes it possible to classify the rainfall events into f…
Morphostatistical characterization of the spatial galaxy distribution through Gibbs point processes
2021
This paper proposes a morpho-statistical characterisation of the galaxy distribution through spatial statistical modelling based on inhomogeneous Gibbs point processes. The galaxy distribution is supposed to exhibit two components. The first one is related to the major geometrical features exhibited by the observed galaxy field, here, its corresponding filamentary pattern. The second one is related to the interactions exhibited by the galaxies. Gibbs point processes are statistical models able to integrate these two aspects in a probability density, controlled by some parameters. Several such models are fitted to real observational data via the ABC Shadow algorithm. This algorithm provides …