Search results for "missing data"
showing 10 items of 83 documents
WEIGHTS AND IMPUTATIONS
2019
This chapter provides a description of the weighting and imputation strategies used to address problems of unit nonresponse, sample attrition and item nonresponse in the seventh wave of SHARE.
A Generalized Missing-Indicator Approach to Regression with Imputed Covariates
2011
We consider estimation of a linear regression model using data where some covariate values are missing but imputations are available to fill in the missing values. This situation generates a tradeoff between bias and precision when estimating the regression parameters of interest. Using only the subsample of complete observations does not cause bias but may imply a substantial loss of precision because the complete cases may be too few. On the other hand, filling in the missing values with imputations may cause bias. We provide the new Stata command gmi, which handles such tradeoff by using either model reduction or Bayesian model averaging techniques in the context of the generalized miss…
EOFs for gap filling in multivariate air quality data: a FDA approach
2010
Missing values are a common concern in spatiotemporal data sets. During recent years a great number of methods have been developed for gap filling. One of the emerging approaches is based on the Empirical Orthogonal Function (EOF) methodology, applied mainly on raw and univariate data sets presenting irregular missing patterns. In this paper EOF is carried out on a multivariate space-time data set, related to concentrations of pollutants recorded at different sites, after denoising raw data by FDA approach. Some performance indicators are computed on simulated incomplete data sets with also long gaps in order to show that the EOF reconstruction appears to be an improved procedure especially…
Physics-aware Gaussian processes in remote sensing
2018
Abstract Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics, and generally yields efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations between the state vector and the radiance observations is available though and could be useful to improve pre…
Estimating person parameters via item response model and simple sum score in small samples with few polytomous items: A simulation study
2018
Background The Item Response Theory (IRT) is becoming increasingly popular for item analysis. Theoretical considerations and simulation studies suggest that parameter estimates will become precise only by utilizing many items in large samples. Method A simulation study focusing on a single scale was performed on data with (a) n = 40, 60, 80, 120, 200, 300, 500, and 900 cases utilizing (b) 4, 8, 16, or 32 items. The items were (c) symmetrically distributed vs. skew (skewness 0, 1, and 2). Item loadings were (d) homogeneous vs. heterogeneous. Item loadings were (e) low vs. high. Half of the items had (f) a correlated error or not. The number of answering categories (g) was four vs. five. A to…
Forecasting time series with missing data using Holt's model
2009
This paper deals with the prediction of time series with missing data using an alternative formulation for Holt's model with additive errors. This formulation simplifies both the calculus of maximum likelihood estimators of all the unknowns in the model and the calculus of point forecasts. In the presence of missing data, the EM algorithm is used to obtain maximum likelihood estimates and point forecasts. Based on this application we propose a leave-one-out algorithm for the data transformation selection problem which allows us to analyse Holt's model with multiplicative errors. Some numerical results show the performance of these procedures for obtaining robust forecasts.
Correcting for non-ignorable missingness in smoking trends
2015
Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972-1997. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The u…
Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
2017
Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…
Correction: Correcting for non-ignorable missingness in smoking trends
2017
Extending graphical models for applications: on covariates, missingness and normality
2021
The authors of the paper “Bayesian Graphical Models for Modern Biological Applications” have put forward an important framework for making graphical models more useful in applied settings. In this discussion paper, we give a number of suggestions for making this framework even more suitable for practical scenarios. Firstly, we show that an alternative and simplified definition of covariate might make the framework more manageable in high-dimensional settings. Secondly, we point out that the inclusion of missing variables is important for practical data analysis. Finally, we comment on the effect that the Gaussianity assumption has in identifying the underlying conditional independence graph…