Search results for "missing"
showing 10 items of 174 documents
Regression with Imputed Covariates: A Generalized Missing Indicator Approach
2011
A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may then…
WEIGHTS AND IMPUTATIONS
2019
This chapter provides a description of the weighting and imputation strategies used to address problems of unit nonresponse, sample attrition and item nonresponse in the seventh wave of SHARE.
A Generalized Missing-Indicator Approach to Regression with Imputed Covariates
2011
We consider estimation of a linear regression model using data where some covariate values are missing but imputations are available to fill in the missing values. This situation generates a tradeoff between bias and precision when estimating the regression parameters of interest. Using only the subsample of complete observations does not cause bias but may imply a substantial loss of precision because the complete cases may be too few. On the other hand, filling in the missing values with imputations may cause bias. We provide the new Stata command gmi, which handles such tradeoff by using either model reduction or Bayesian model averaging techniques in the context of the generalized miss…
EOFs for gap filling in multivariate air quality data: a FDA approach
2010
Missing values are a common concern in spatiotemporal data sets. During recent years a great number of methods have been developed for gap filling. One of the emerging approaches is based on the Empirical Orthogonal Function (EOF) methodology, applied mainly on raw and univariate data sets presenting irregular missing patterns. In this paper EOF is carried out on a multivariate space-time data set, related to concentrations of pollutants recorded at different sites, after denoising raw data by FDA approach. Some performance indicators are computed on simulated incomplete data sets with also long gaps in order to show that the EOF reconstruction appears to be an improved procedure especially…
Air quality and integration of short-term and long-term pollutant data
2008
Modelling PM10 is an important problem in statistical methodology, above all to explain the PM10 behaviour in space and time, since it has been linked to many adverse effects on human and environmental health. But the large spatial variability of the main traffic-related pollutants, and in particular here the PM10, implies the impossibility of obtaining from the data of the fixed stations a complete pictures of the atmospheric pollution in the urban areas. Information from fixed monitoring stations (long-term measurements) are therefore integrated with the ones deriving from mobile station (short-term measurements). Short-term measurements are incomplete and so it is necessary to integrate …
Physics-aware Gaussian processes in remote sensing
2018
Abstract Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics, and generally yields efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations between the state vector and the radiance observations is available though and could be useful to improve pre…
Measurements of Higgs boson production and couplings in diboson final states with the ATLAS detector at the LHC
2013
We acknowledge the support of ANPCyT, Argentina; YerPhI, Armenia; ARC, Australia; BMWF and FWF, Austria; ANAS, Azerbaijan; SSTC, Belarus; CNPq and FAPESP, Brazil; NSERC, NRC and CFI, Canada; CERN; CONICYT, Chile; CAS, MOST and NSFC, China; COLCIENCIAS, Colombia; MSMT CR, MPO CR and VSC CR, Czech Republic; DNRF, DNSRC and Lundbeck Foundation, Denmark; EPLANET, ERC and NSRF, European Union; IN2P3-CNRS, CEA-DSM/IRFU, France; GNSF, Georgia; BMBF, DFG, HGF, MPG and AvH Foundation, Germany; GSRT and NSRF, Greece; ISF, MINERVA, GIF, DIP and Benoziyo Center, Israel; INFN, Italy; MEXT and JSPS, Japan; CNRST, Morocco; FOM and NWO, Netherlands; BRF and RCN, Norway; MNiSW, Poland; GRICES and FCT, Portu…
Estimating person parameters via item response model and simple sum score in small samples with few polytomous items: A simulation study
2018
Background The Item Response Theory (IRT) is becoming increasingly popular for item analysis. Theoretical considerations and simulation studies suggest that parameter estimates will become precise only by utilizing many items in large samples. Method A simulation study focusing on a single scale was performed on data with (a) n = 40, 60, 80, 120, 200, 300, 500, and 900 cases utilizing (b) 4, 8, 16, or 32 items. The items were (c) symmetrically distributed vs. skew (skewness 0, 1, and 2). Item loadings were (d) homogeneous vs. heterogeneous. Item loadings were (e) low vs. high. Half of the items had (f) a correlated error or not. The number of answering categories (g) was four vs. five. A to…
Forecasting time series with missing data using Holt's model
2009
This paper deals with the prediction of time series with missing data using an alternative formulation for Holt's model with additive errors. This formulation simplifies both the calculus of maximum likelihood estimators of all the unknowns in the model and the calculus of point forecasts. In the presence of missing data, the EM algorithm is used to obtain maximum likelihood estimates and point forecasts. Based on this application we propose a leave-one-out algorithm for the data transformation selection problem which allows us to analyse Holt's model with multiplicative errors. Some numerical results show the performance of these procedures for obtaining robust forecasts.
Correcting for non-ignorable missingness in smoking trends
2015
Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972-1997. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The u…