Search results for "missing data"
showing 10 items of 83 documents
Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach
2021
Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…
Estimating with kernel smoothers the mean of functional data in a finite population setting. A note on variance estimation in presence of partially o…
2014
In the near future, millions of load curves measuring the electricity consumption of French households in small time grids (probably half hours) will be available. All these collected load curves represent a huge amount of information which could be exploited using survey sampling techniques. In particular, the total consumption of a specific cus- tomer group (for example all the customers of an electricity supplier) could be estimated using unequal probability random sampling methods. Unfortunately, data collection may undergo technical problems resulting in missing values. In this paper we study a new estimation method for the mean curve in the presence of missing values which consists in…
Study design in causal models
2012
The causal assumptions, the study design and the data are the elements required for scientific inference in empirical research. The research is adequately communicated only if all of these elements and their relations are described precisely. Causal models with design describe the study design and the missing data mechanism together with the causal structure and allow the direct application of causal calculus in the estimation of the causal effects. The flow of the study is visualized by ordering the nodes of the causal diagram in two dimensions by their causal order and the time of the observation. Conclusions whether a causal or observational relationship can be estimated from the collect…
Comparing Spatial and Spatio-temporal FPCA to Impute Large Continuous Gaps in Space
2018
Multivariate spatio-temporal data analysis methods usually assume fairly complete data, while a number of gaps often occur along time or in space. In air quality data long gaps may be due to instrument malfunctions; moreover, not all the pollutants of interest are measured in all the monitoring stations of a network. In literature, many statistical methods have been proposed for imputing short sequences of missing values, but most of them are not valid when the fraction of missing values is high. Furthermore, the limitation of the methods commonly used consists in exploiting temporal only, or spatial only, correlation of the data. The objective of this paper is to provide an approach based …
Model averaging estimation of generalized linear models with imputed covariates
2015
a b s t r a c t We address the problem of estimating generalized linear models when some covariate values are missing but imputations are available to fill-in the missing values. This situation generates a bias-precision trade- off in the estimation of the model parameters. Extending the generalized missing-indicator method proposed by Dardanoni et al. (2011) for linear regression, we handle this trade-off as a problem of model uncertainty using Bayesian averaging of classical maximum likelihood estimators (BAML). We also propose a block model averaging strategy that incorporates information on the missing-data patterns and is computationally simple. An empirical application illustrates our…
Impact of missing data mechanism on the estimate of change: a case study on cognitive function and polypharmacy among older persons
2015
Piia Lavikainen,1,2 Esko Leskinen,3 Sirpa Hartikainen,1,2 Jyrki Möttönen,4 Raimo Sulkava,5 Maarit J Korhonen6 1Kuopio Research Centre of Geriatric Care, University of Eastern Finland, Kuopio, Finland; 2School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland; 3Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland; 4Department of Social Research, University of Helsinki, Helsinki, Finland; 5Department of Geriatrics, Institute of Public Health and Clinical Nutrition, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland; 6Department of Pharmacology, D…
Comparative analysis of different techniques for spatial interpolation of rainfall data to create a serially complete monthly time series of precipit…
2011
Abstract The availability of good and reliable rainfall data is fundamental for most hydrological analyses and for the design and management of water resources systems. However, in practice, precipitation records often suffer from missing data values mainly due to malfunctioning of raingauge for specific time periods. This is an important issue in practical hydrology because it affects the continuity of rainfall data and ultimately influences the results of hydrologic studies which use rainfall as input. Many methods to estimate missing rainfall data have been proposed in literature and, among these, most are based on spatial interpolation algorithms. In this paper different spatial interpo…
Robust Principal Component Analysis of Data with Missing Values
2015
Principal component analysis is one of the most popular machine learning and data mining techniques. Having its origins in statistics, principal component analysis is used in numerous applications. However, there seems to be not much systematic testing and assessment of principal component analysis for cases with erroneous and incomplete data. The purpose of this article is to propose multiple robust approaches for carrying out principal component analysis and, especially, to estimate the relative importances of the principal components to explain the data variability. Computational experiments are first focused on carefully designed simulated tests where the ground truth is known and can b…
2014
This paper considers the parameter estimation for linear time-invariant (LTI) systems in an input-output setting with output error (OE) time-delay model structure. The problem of missing data is commonly experienced in industry due to irregular sampling, sensor failure, data deletion in data preprocessing, network transmission fault, and so forth; to deal with the identification of LTI systems with time-delay in incomplete-data problem, the generalized expectation-maximization (GEM) algorithm is adopted to estimate the model parameters and the time-delay simultaneously. Numerical examples are provided to demonstrate the effectiveness of the proposed method.
Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values
2017
In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of eac…