Search results for "missing data"
showing 10 items of 83 documents
Regression with imputed covariates: A generalized missing-indicator approach
2011
A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper, we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may the…
A new methodology based on functional principal component analysis tostudy postural stability post-stroke
2018
[EN] Background. A major goal in stroke rehabilitation is the establishment of more effective physical therapy techniques to recover postural stability. Functional Principal Component Analysis provides greater insight into recovery trends. However, when missing values exist, obtaining functional data presents some difficulties. The purpose of this study was to reveal an alternative technique for obtaining the Functional Principal Components without requiring the conversion to functional data beforehand and to investigate this methodology to determine the effect of specific physical therapy techniques in balance recovery trends in elderly subjects with hemiplegia post-stroke. Methods: A rand…
Physics-Aware Gaussian Processes for Earth Observation
2017
Earth observation from satellite sensory data pose challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression and other kernel methods have excelled in biophysical parameter estimation tasks from space. GP regression is based on solid Bayesian statistics, and generally yield efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations is available though. In this work, we review three GP models that respect and learn the physics of the underlying processes …
Regression imputation for Space-Time datasets with missing values
2009
Data consisting in repeated observations on a series of fixed units are very common in different context like biological, environmental and social sciences, and different terminology is often used to indicate this kind of data: panel data, longitudinal data, time series-cross section data (TSCS), spatio-temporal data. Missing information are inevitable in longitudinal studies, and can produce biased estimates and loss of powers. The aim of this paper is to propose a new regression (single) imputation method that, considering the particular structure and characteristics of the data set, creates a “complete” data set that can be analyzed by any researcher on different occasions and using diff…
Regression with Imputed Covariates: A Generalized Missing Indicator Approach
2011
A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may then…
cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values
2023
Sparse graphical models have revolutionized multivariate inference. With the advent of high-dimensional multivariate data in many applied fields, these methods are able to detect a much lower-dimensional structure, often represented via a sparse conditional independence graph. There have been numerous extensions of such methods in the past decade. Many practical applications have additional covariates or suffer from missing or censored data. Despite the development of these extensions of sparse inference methods for graphical models, there have been so far no implementations for, e.g., conditional graphical models. Here we present the general-purpose package cglasso for estimating sparse co…
The Hierarchical Agglomerative Clustering with Gower index: a methodology for automatic design of OLAP cube in ecological data processing context
2015
In Press, Corrected Proof; International audience; The OLAP systems can be an improvement for ecological studies. In fact, ecology studies, follows and analyzes phenomenon across space and time and according to several parameters. OLAP systems can provide to ecologists browsing in a large dataset. One focus of the current research on OLAP system is the automatic design of OLAP cubes and of data warehouse schemas. This kind of works makes accessible OLAP technology to non information technology experts. But to be efficient, the automatic OLAP building must take into account various cases. Moreover the OLAP technology is based on the concept of hierarchy. Thereby the hierarchical clustering m…
A robust evolutionary algorithm for the recovery of rational Gielis curves
2013
International audience; Gielis curves (GC) can represent a wide range of shapes and patterns ranging from star shapes to symmetric and asymmetric polygons, and even self intersecting curves. Such patterns appear in natural objects or phenomena, such as flowers, crystals, pollen structures, animals, or even wave propagation. Gielis curves and surfaces are an extension of Lamé curves and surfaces (superquadrics) which have benefited in the last two decades of extensive researches to retrieve their parameters from various data types, such as range images, 2D and 3D point clouds, etc. Unfortunately, the most efficient techniques for superquadrics recovery, based on deterministic methods, cannot…
Correction: Correcting for non-ignorable missingness in smoking trends
2017
Cost-description and multiple imputation of missing values: theSATisfaction and adherence to COPD treatment(SAT) study
2018
Aim:This article reports on a retrospective quarterly cost description (CD) performed on 401 patients with stable chronic obstructive pulmonary disease (COPD) at enrolment in the national, multicen...