Search results for "missing data"
showing 10 items of 83 documents
Single imputation method of missing values in environmental pollution data sets
2006
Abstract Missing data represent a general problem in many scientific fields above all in environmental research. Several methods have been proposed in literature for handling missing data and the choice of an appropriate method depends, among others, on the missing data pattern and on the missing-data mechanism. One approach to the problem is to impute them to yield a complete data set. The goal of this paper is to propose a new single imputation method and to compare its performance to other single and multiple imputation methods known in literature. Considering a data set of PM 10 concentration measured every 2 h by eight monitoring stations distributed over the metropolitan area of Paler…
Systematic handling of missing data in complex study designs : experiences from the Health 2000 and 2011 Surveys
2016
We present a systematic approach to the practical and comprehensive handling of missing data motivated by our experiences of analyzing longitudinal survey data. We consider the Health 2000 and 2011 Surveys (BRIF8901) where increased non-response and non-participation from 2000 to 2011 was a major issue. The model assumptions involved in the complex sampling design, repeated measurements design, non-participation mechanisms and associations are presented graphically using methodology previously defined as a causal model with design, i.e. a functional causal model extended with the study design. This tool forces the statistician to make the study design and the missing-data mechanism explicit…
Using Deep Learning to Extrapolate Protein Expression Measurements
2020
Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, in…
Estimating with kernel smoothers the mean of functional data in a finite population setting. A note on variance estimation in presence of partially o…
2014
In the near future, millions of load curves measuring the electricity consumption of French households in small time grids (probably half hours) will be available. All these collected load curves represent a huge amount of information which could be exploited using survey sampling techniques. In particular, the total consumption of a specific cus- tomer group (for example all the customers of an electricity supplier) could be estimated using unequal probability random sampling methods. Unfortunately, data collection may undergo technical problems resulting in missing values. In this paper we study a new estimation method for the mean curve in the presence of missing values which consists in…
P-1294 - Utility of the world health organization disability assessment schedule II in schizophrenia
2012
Aim The World Health Organization Disability Assessment Schedule II (WHODAS II) was developed for assessing disability. This study provides data on the validity and utility of the Spanish version of the WHODAS II in a large sample of patients with schizophrenia. Methods The sample included 352 patients with a schizophrenia spectrum disorder. They completed a comprehensive assessment battery including measures of psychopathology, functionality and quality-of-life. A sub-sample of 36 patients was retested after six months to assess its temporal stability. Results Participation in society (6.3%) and Life activities (4.0%) were the domains with the highest percentage of missing data. The intern…
Bayesian joint modeling for assessing the progression of chronic kidney disease in children.
2016
Joint models are rich and flexible models for analyzing longitudinal data with nonignorable missing data mechanisms. This article proposes a Bayesian random-effects joint model to assess the evolution of a longitudinal process in terms of a linear mixed-effects model that accounts for heterogeneity between the subjects, serial correlation, and measurement error. Dropout is modeled in terms of a survival model with competing risks and left truncation. The model is applied to data coming from ReVaPIR, a project involving children with chronic kidney disease whose evolution is mainly assessed through longitudinal measurements of glomerular filtration rate.
2021
Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…
deaR-Shiny: An Interactive Web App for Data Envelopment Analysis
2021
In this paper, we describe an interactive web application (deaR-shiny) to measure efficiency and productivity using data envelopment analysis (DEA). deaR-shiny aims to fill the gap that currently exists in the availability of online DEA software offering practitioners and researchers free access to a very wide variety of DEA models (both conventional and fuzzy models). We illustrate how to use the web app by replicating the main results obtained by Carlucci, Cirà and Coccorese in 2018, who investigate the efficiency and economic sustainability of Italian regional airport by using two conventional DEA models, and the results given by Kao and Liu in their papers published in 2000 and 2003, wh…
2013
Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequent…
Model averaging estimation of generalized linear models with imputed covariates
2015
a b s t r a c t We address the problem of estimating generalized linear models when some covariate values are missing but imputations are available to fill-in the missing values. This situation generates a bias-precision trade- off in the estimation of the model parameters. Extending the generalized missing-indicator method proposed by Dardanoni et al. (2011) for linear regression, we handle this trade-off as a problem of model uncertainty using Bayesian averaging of classical maximum likelihood estimators (BAML). We also propose a block model averaging strategy that incorporates information on the missing-data patterns and is computationally simple. An empirical application illustrates our…