Search results for "sampling"
showing 10 items of 788 documents
The impact of sample reduction on PCA-based feature extraction for supervised learning
2006
"The curse of dimensionality" is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity and classification error in high dimensions. In this paper, different feature extraction (FE) techniques are analyzed as means of dimensionality reduction, and constructive induction with respect to the performance of Naive Bayes classifier. When a data set contains a large number of instances, some sampling approach is applied to address the computational complexity of FE and classification processes. The main goal of this paper is to show the impact of sample reduction on the process of FE for supervised learning. In our study we analyzed the conventional PC…
Register data in sample allocations for small-area estimation
2018
The inadequate control of sample sizes in surveys using stratified sampling and area estimation may occur when the overall sample size is small or auxiliary information is insufficiently used. Very small sample sizes are possible for some areas. The proposed allocation based on multi-objective optimization uses a small-area model and estimation method and semi-collected empirical data annually collected empirical data. The assessment of its performance at the area and at the population levels is based on design-based sample simulations. Five previously developed allocations serve as references. The model-based estimator is more accurate than the design-based Horvitz–Thompson estimator and t…
Multiscale Granger causality analysis by à trous wavelet transform
2017
Since interactions in neural systems occur across multiple temporal scales, it is likely that information flow will exhibit a multiscale structure, thus requiring a multiscale generalization of classical temporal precedence causality analysis like Granger's approach. However, the computation of multiscale measures of information dynamics is complicated by theoretical and practical issues such as filtering and undersampling: to overcome these problems, we propose a wavelet-based approach for multiscale Granger causality (GC) analysis, which is characterized by the following properties: (i) only the candidate driver variable is wavelet transformed (ii) the decomposition is performed using the…
Adaptive Population Importance Samplers: A General Perspective
2016
Importance sampling (IS) is a well-known Monte Carlo method, widely used to approximate a distribution of interest using a random measure composed of a set of weighted samples generated from another proposal density. Since the performance of the algorithm depends on the mismatch between the target and the proposal densities, a set of proposals is often iteratively adapted in order to reduce the variance of the resulting estimator. In this paper, we review several well-known adaptive population importance samplers, providing a unified common framework and classifying them according to the nature of their estimation and adaptive procedures. Furthermore, we interpret the underlying motivation …
Group Metropolis Sampling
2017
Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. Two well-known class of MC methods are the Importance Sampling (IS) techniques and the Markov Chain Monte Carlo (MCMC) algorithms. In this work, we introduce the Group Importance Sampling (GIS) framework where different sets of weighted samples are properly summarized with one summary particle and one summary weight. GIS facilitates the design of novel efficient MC techniques. For instance, we present the Group Metropolis Sampling (GMS) algorithm which produces a Markov chain of sets of weighted samples. GMS in general outperforms other multiple try schemes…
Recycling Gibbs sampling
2017
Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning and statistics. The key point for the successful application of the Gibbs sampler is the ability to draw samples from the full-conditional probability density functions efficiently. In the general case this is not possible, so in order to speed up the convergence of the chain, it is required to generate auxiliary samples. However, such intermediate information is finally disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. Theoretical and exhaustive numerical co…
Theoretical Foundations of the Monte Carlo Method and Its Applications in Statistical Physics
2002
In this chapter we first introduce the basic concepts of Monte Carlo sampling, give some details on how Monte Carlo programs need to be organized, and then proceed to the interpretation and analysis of Monte Carlo results.
Clinically-Driven Virtual Patient Cohorts Generation: An Application to Aorta
2021
The combination of machine learning methods together with computational modeling and simulation of the cardiovascular system brings the possibility of obtaining very valuable information about new therapies or clinical devices through in-silico experiments. However, the application of machine learning methods demands access to large cohorts of patients. As an alternative to medical data acquisition and processing, which often requires some degree of manual intervention, the generation of virtual cohorts made of synthetic patients can be automated. However, the generation of a synthetic sample can still be computationally demanding to guarantee that it is clinically meaningful and that it re…
Modeling Snow Dynamics Using a Bayesian Network
2015
In this paper we propose a novel snow accumulation and melt model, formulated as a Dynamic Bayesian Network DBN. We encode uncertainty explicitly and train the DBN using Monte Carlo analysis, carried out with a deterministic hydrology model under a wide range of plausible parameter configurations. The trained DBN was tested against field observations of snow water equivalents SWE. The results indicate that our DBN can be used to reason about uncertainty, without doing resampling from the deterministic model. In all brevity, the DBN's ability to reproduce the mean of the observations was similar to what could be obtained with the deterministic hydrology model, but with a more realistic repre…
Efficient anomaly detection on sampled data streams with contaminated phase I data
2020
International audience; Control chart algorithms aim to monitor a process over time. This process consists of two phases. Phase I, also called the learning phase, estimates the normal process parameters, then in Phase II, anomalies are detected. However, the learning phase itself can contain contaminated data such as outliers. If left undetected, they can jeopardize the accuracy of the whole chart by affecting the computed parameters, which leads to faulty classifications and defective data analysis results. This problem becomes more severe when the analysis is done on a sample of the data rather than the whole data. To avoid such a situation, Phase I quality must be guaranteed. The purpose…