Search results for "missing data"

showing 10 items of 83 documents

WEIGHTS AND IMPUTATIONS

2019

This chapter provides a description of the weighting and imputation strategies used to address problems of unit nonresponse, sample attrition and item nonresponse in the seventh wave of SHARE.

Settore SECS-P/05 - EconometriaWeights Imputations nonresponse errors attrition missing data.
researchProduct

A Generalized Missing-Indicator Approach to Regression with Imputed Covariates

2011

We consider estimation of a linear regression model using data where some covariate values are missing but imputations are available to fill in the missing values. This situation generates a tradeoff between bias and precision when estimating the regression parameters of interest. Using only the subsample of complete observations does not cause bias but may imply a substantial loss of precision because the complete cases may be too few. On the other hand, filling in the missing values with imputations may cause bias. We provide the new Stata command gmi, which handles such tradeoff by using either model reduction or Bayesian model averaging techniques in the context of the generalized miss…

Settore SECS-P/05Computer scienceSettore SECS-P/05 - EconometriaMissing dataBayesian inferenceRegressiongmi missing covariates imputation bias–precision tradeoff model reduction model averagingMathematics (miscellaneous)CovariateLinear regressionStatisticsEconometricsStatistics::MethodologyImputation (statistics)Settore SECS-P/01 - Economia PoliticaThe Stata Journal: Promoting communications on statistics and Stata
researchProduct

EOFs for gap filling in multivariate air quality data: a FDA approach

2010

Missing values are a common concern in spatiotemporal data sets. During recent years a great number of methods have been developed for gap filling. One of the emerging approaches is based on the Empirical Orthogonal Function (EOF) methodology, applied mainly on raw and univariate data sets presenting irregular missing patterns. In this paper EOF is carried out on a multivariate space-time data set, related to concentrations of pollutants recorded at different sites, after denoising raw data by FDA approach. Some performance indicators are computed on simulated incomplete data sets with also long gaps in order to show that the EOF reconstruction appears to be an improved procedure especially…

Settore SECS-S/01 - StatisticaFDA EOF missing data gap filling
researchProduct

Physics-aware Gaussian processes in remote sensing

2018

Abstract Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics, and generally yields efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations between the state vector and the radiance observations is available though and could be useful to improve pre…

Signal Processing (eess.SP)FOS: Computer and information sciences010504 meteorology & atmospheric sciences0211 other engineering and technologies02 engineering and technologyStatistics - Applications01 natural sciencessymbols.namesakeFOS: Electrical engineering electronic engineering information engineeringApplications (stat.AP)Electrical Engineering and Systems Science - Signal ProcessingGaussian processGaussian process emulator021101 geological & geomatics engineering0105 earth and related environmental sciencesbusiness.industryEstimation theoryBayesian optimizationState vectorMissing dataBayesian statisticssymbolsGlobal Positioning SystembusinessAlgorithmSoftwareApplied Soft Computing
researchProduct

Estimating person parameters via item response model and simple sum score in small samples with few polytomous items: A simulation study

2018

Background The Item Response Theory (IRT) is becoming increasingly popular for item analysis. Theoretical considerations and simulation studies suggest that parameter estimates will become precise only by utilizing many items in large samples. Method A simulation study focusing on a single scale was performed on data with (a) n = 40, 60, 80, 120, 200, 300, 500, and 900 cases utilizing (b) 4, 8, 16, or 32 items. The items were (c) symmetrically distributed vs. skew (skewness 0, 1, and 2). Item loadings were (d) homogeneous vs. heterogeneous. Item loadings were (e) low vs. high. Half of the items had (f) a correlated error or not. The number of answering categories (g) was four vs. five. A to…

Statistics and ProbabilityAnalysis of VarianceScale (ratio)EpidemiologyItem analysisSkewPolytomous Rasch modelMissing data01 natural sciences010104 statistics & probability03 medical and health sciences0302 clinical medicineSimple (abstract algebra)SkewnessSample SizeStatisticsItem response theoryHumansRegression AnalysisComputer Simulation030212 general & internal medicine0101 mathematicsCorrelation of DataMathematicsStatistics in Medicine
researchProduct

Forecasting time series with missing data using Holt's model

2009

This paper deals with the prediction of time series with missing data using an alternative formulation for Holt's model with additive errors. This formulation simplifies both the calculus of maximum likelihood estimators of all the unknowns in the model and the calculus of point forecasts. In the presence of missing data, the EM algorithm is used to obtain maximum likelihood estimates and point forecasts. Based on this application we propose a leave-one-out algorithm for the data transformation selection problem which allows us to analyse Holt's model with multiplicative errors. Some numerical results show the performance of these procedures for obtaining robust forecasts.

Statistics and ProbabilityApplied MathematicsAutocorrelationExponential smoothingLinear modelData transformation (statistics)EstimatorMissing dataExpectation–maximization algorithmStatisticsStatistics Probability and UncertaintyAdditive modelAlgorithmMathematicsJournal of Statistical Planning and Inference
researchProduct

Correcting for non-ignorable missingness in smoking trends

2015

Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972-1997. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The u…

Statistics and ProbabilityBackground informationFOS: Computer and information sciencesta112Test data generationComputer scienceSurvey samplingnon-participationta3142Smoking prevalenceBayesian inferenceMissing dataStatistics - Applicationsregistry dataMethodology (stat.ME)missing dataStatisticsSurvey data collectionRegistry dataApplications (stat.AP)Statistics Probability and Uncertaintysurvey samplingStatistics - Methodologysmoking prevalencehealth examination survey
researchProduct

Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

2017

Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…

Statistics and ProbabilityComputer scienceComputationDimensionality reductionIncremental methods02 engineering and technologyMissing data01 natural sciences010104 statistics & probabilityData explosionStreaming dataPrincipal component analysis0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0101 mathematicsStatistics Probability and UncertaintyAlgorithmEigendecomposition of a matrixInternational Statistical Review
researchProduct

Correction: Correcting for non-ignorable missingness in smoking trends

2017

Statistics and ProbabilityComputer scienceStatisticsStatistics Probability and UncertaintyMissing dataStat
researchProduct

Extending graphical models for applications: on covariates, missingness and normality

2021

The authors of the paper “Bayesian Graphical Models for Modern Biological Applications” have put forward an important framework for making graphical models more useful in applied settings. In this discussion paper, we give a number of suggestions for making this framework even more suitable for practical scenarios. Firstly, we show that an alternative and simplified definition of covariate might make the framework more manageable in high-dimensional settings. Secondly, we point out that the inclusion of missing variables is important for practical data analysis. Finally, we comment on the effect that the Gaussianity assumption has in identifying the underlying conditional independence graph…

Statistics and ProbabilityComputer sciencemedia_common.quotation_subjectMissing dataConditional graphical modelsCopula graphical modelsMissing dataCovariateEconometricsSparse inferenceGraphical modelStatistics Probability and UncertaintyNormalitymedia_common
researchProduct