Search results for " statistics"
showing 10 items of 1891 documents
Unsupervised Anomaly and Change Detection With Multivariate Gaussianization
2022
Anomaly detection (AD) is a field of intense research in remote sensing (RS) image processing. Identifying low probability events in RS images is a challenging problem given the high dimensionality of the data, especially when no (or little) information about the anomaly is available a priori. While a plenty of methods are available, the vast majority of them do not scale well to large datasets and require the choice of some (very often critical) hyperparameters. Therefore, unsupervised and computationally efficient detection methods become strictly necessary, especially now with the data deluge problem. In this article, we propose an unsupervised method for detecting anomalies and changes …
Forecasting : theory and practice
2022
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a varie…
Implicit differentiation of Lasso-type models for hyperparameter optimization
2020
International audience; Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparam-eter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter , which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient w.r.t. the hyperpa-rameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually s…
On the Universality of Graph Neural Networks on Large Random Graphs
2021
International audience; We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain "continuous" models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with …
Convergence and Stability of Graph Convolutional Networks on Large Random Graphs
2020
International audience; We study properties of Graph Convolutional Networks (GCNs) by analyzing their behavior on standard models of random graphs, where nodes are represented by random latent variables and edges are drawn according to a similarity kernel. This allows us to overcome the difficulties of dealing with discrete notions such as isomorphisms on very large graphs, by considering instead more natural geometric aspects. We first study the convergence of GCNs to their continuous counterpart as the number of nodes grows. Our results are fully non-asymptotic and are valid for relatively sparse graphs with an average degree that grows logarithmically with the number of nodes. We then an…
Dual Extrapolation for Sparse Generalized Linear Models
2020
International audience; Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, o…
A Review of Multiple Try MCMC algorithms for Signal Processing
2018
Many applications in signal processing require the estimation of some parameters of interest given a set of observed data. More specifically, Bayesian inference needs the computation of {\it a-posteriori} estimators which are often expressed as complicated multi-dimensional integrals. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and Monte Carlo methods are the only feasible approach. A very powerful class of Monte Carlo techniques is formed by the Markov Chain Monte Carlo (MCMC) algorithms. They generate a Markov chain such that its stationary distribution coincides with the target posterior density. In this work, we perform a t…
Critical comments on EEG sensor space dynamical connectivity analysis
2019
Many different analysis techniques have been developed and applied to EEG recordings that allow one to investigate how different brain areas interact. One particular class of methods, based on the linear parametric representation of multiple interacting time series, is widely used to study causal connectivity in the brain. However, the results obtained by these methods should be interpreted with great care. The goal of this paper is to show, both theoretically and using simulations, that results obtained by applying causal connectivity measures on the sensor (scalp) time series do not allow interpretation in terms of interacting brain sources. This is because (1) the channel locations canno…
Modeling temporal treatment effects with zero inflated semi-parametric regression models: The case of local development policies in France
2017
International audience; A semi-parametric approach is proposed to estimate the variation along time of the effects of two distinct public policies that were devoted to boost rural development in France over a similar period of time. At a micro data level, it is often observed that the dependent variable, such as local employment, does not vary along time, so that we face a kind of zero inflated phenomenon that cannot be dealt with a continuous response model. We introduce a conditional mixture model which combines a mass at zero and a continuous response. The suggested zero inflated semi-parametric statistical approach relies on the flexibility and modularity of additive models with the abi…
Do-search -- a tool for causal inference and study design with multiple data sources
2020
Epidemiologic evidence is based on multiple data sources including clinical trials, cohort studies, surveys, registries, and expert opinions. Merging information from different sources opens up new possibilities for the estimation of causal effects. We show how causal effects can be identified and estimated by combining experiments and observations in real and realistic scenarios. As a new tool, we present do-search, a recently developed algorithmic approach that can determine the identifiability of a causal effect. The approach is based on do-calculus, and it can utilize data with nontrivial missing data and selection bias mechanisms. When the effect is identifiable, do-search outputs an i…