Search results for " Probability"
showing 10 items of 2176 documents
BayesVarSel: Bayesian Testing, Variable Selection and model averaging in Linear Models using R
2016
This paper introduces the R package BayesVarSel which implements objective Bayesian methodology for hypothesis testing and variable selection in linear models. The package computes posterior probabilities of the competing hypotheses/models and provides a suite of tools, specifically proposed in the literature, to properly summarize the results. Additionally, \ourpack\ is armed with functions to compute several types of model averaging estimations and predictions with weights given by the posterior probabilities. BayesVarSel contains exact algorithms to perform fast computations in problems of small to moderate size and heuristic sampling methods to solve large problems. The software is inte…
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
2012
Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…
The FLUXCOM ensemble of global land-atmosphere energy fluxes
2019
Although a key driver of Earth’s climate system, global land-atmosphere energy fluxes are poorly constrained. Here we use machine learning to merge energy flux measurements from FLUXNET eddy covariance towers with remote sensing and meteorological data to estimate global gridded net radiation, latent and sensible heat and their uncertainties. The resulting FLUXCOM database comprises 147 products in two setups: (1) 0.0833° resolution using MODIS remote sensing data (RS) and (2) 0.5° resolution using remote sensing and meteorological data (RS + METEO). Within each setup we use a full factorial design across machine learning methods, forcing datasets and energy balance closure corrections. For…
Sparse and Smooth: improved guarantees for Spectral Clustering in the Dynamic Stochastic Block Model
2020
In this paper, we analyse classical variants of the Spectral Clustering (SC) algorithm in the Dynamic Stochastic Block Model (DSBM). Existing results show that, in the relatively sparse case where the expected degree grows logarithmically with the number of nodes, guarantees in the static case can be extended to the dynamic case and yield improved error bounds when the DSBM is sufficiently smooth in time, that is, the communities do not change too much between two time steps. We improve over these results by drawing a new link between the sparsity and the smoothness of the DSBM: the more regular the DSBM is, the more sparse it can be, while still guaranteeing consistent recovery. In particu…
Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach
2021
Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…
Conditional particle filters with diffuse initial distributions
2020
Conditional particle filters (CPFs) are powerful smoothing algorithms for general nonlinear/non-Gaussian hidden Markov models. However, CPFs can be inefficient or difficult to apply with diffuse initial distributions, which are common in statistical applications. We propose a simple but generally applicable auxiliary variable method, which can be used together with the CPF in order to perform efficient inference with diffuse initial distributions. The method only requires simulatable Markov transitions that are reversible with respect to the initial distribution, which can be improper. We focus in particular on random-walk type transitions which are reversible with respect to a uniform init…
A novel exact representation of stationary colored Gaussian processes (fractional differential approach)
2010
A novel representation of functions, called generalized Taylor form, is applied to the filtering of white noise processes. It is shown that every Gaussian colored noise can be expressed as the output of a set of linear fractional stochastic differential equations whose solution is a weighted sum of fractional Brownian motions. The exact form of the weighting coefficients is given and it is shown that it is related to the fractional moments of the target spectral density of the colored noise.
Fractal surfaces from simple arithmetic operations
2015
Fractal surfaces ('patchwork quilts') are shown to arise under most general circumstances involving simple bitwise operations between real numbers. A theory is presented for all deterministic bitwise operations on a finite alphabet. It is shown that these models give rise to a roughness exponent $H$ that shapes the resulting spatial patterns, larger values of the exponent leading to coarser surfaces.
Unbiased Inference for Discretely Observed Hidden Markov Model Diffusions
2021
We develop a Bayesian inference method for diffusions observed discretely and with noise, which is free of discretisation bias. Unlike existing unbiased inference methods, our method does not rely on exact simulation techniques. Instead, our method uses standard time-discretised approximations of diffusions, such as the Euler--Maruyama scheme. Our approach is based on particle marginal Metropolis--Hastings, a particle filter, randomised multilevel Monte Carlo, and importance sampling type correction of approximate Markov chain Monte Carlo. The resulting estimator leads to inference without a bias from the time-discretisation as the number of Markov chain iterations increases. We give conver…
Local inhomogeneous weighted summary statistics for marked point processes
2023
We introduce a family of local inhomogeneous mark-weighted summary statistics, of order two and higher, for general marked point processes. Depending on how the involved weight function is specified, these summary statistics capture different kinds of local dependence structures. We first derive some basic properties and show how these new statistical tools can be used to construct most existing summary statistics for (marked) point processes. We then propose a local test of random labelling. This procedure allows us to identify points, and consequently regions, where the random labelling assumption does not hold, e.g.~when the (functional) marks are spatially dependent. Through a simulatio…