Search results for "missing data"

showing 10 items of 83 documents

Single imputation method of missing values in environmental pollution data sets

2006

Abstract Missing data represent a general problem in many scientific fields above all in environmental research. Several methods have been proposed in literature for handling missing data and the choice of an appropriate method depends, among others, on the missing data pattern and on the missing-data mechanism. One approach to the problem is to impute them to yield a complete data set. The goal of this paper is to propose a new single imputation method and to compare its performance to other single and multiple imputation methods known in literature. Considering a data set of PM 10 concentration measured every 2 h by eight monitoring stations distributed over the metropolitan area of Paler…

Data setAtmospheric ScienceCorrelation coefficientStatisticsEnvironmental pollutionImputation (statistics)Performance indicatorTime seriesMissing dataRoot-mean-square deviationGeneral Environmental ScienceMathematicsAtmospheric Environment
researchProduct

Systematic handling of missing data in complex study designs : experiences from the Health 2000 and 2011 Surveys

2016

We present a systematic approach to the practical and comprehensive handling of missing data motivated by our experiences of analyzing longitudinal survey data. We consider the Health 2000 and 2011 Surveys (BRIF8901) where increased non-response and non-participation from 2000 to 2011 was a major issue. The model assumptions involved in the complex sampling design, repeated measurements design, non-participation mechanisms and associations are presented graphically using methodology previously defined as a causal model with design, i.e. a functional causal model extended with the study design. This tool forces the statistician to make the study design and the missing-data mechanism explicit…

Statistics and Probabilitymultiple imputationComputer sciencecomputer.software_genre01 natural sciences010104 statistics & probability03 medical and health sciences0302 clinical medicinenon-responseSampling design030212 general & internal medicine0101 mathematicsCausal modelta112Clinical study designInverse probability weightingSampling (statistics)non-participationMissing dataData sciencedoubly robust methodsSurvey data collectionData miningStatistics Probability and Uncertaintycomputerinverse probability weightingStatisticiancausal model with designJournal of Applied Statistics
researchProduct

Using Deep Learning to Extrapolate Protein Expression Measurements

2020

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, in…

ProteomicsIn silicoQuantitative proteomicsComputational biologyBiologyBiochemistryprotein abundance predictionMass SpectrometryProtein expressionMice03 medical and health sciencesDeep LearningAbundance (ecology)AnimalsMolecular BiologyGeneResearch Articles030304 developmental biologydeep learning networks0303 health sciencesUniProt keywordsbusiness.industryDeep learning030302 biochemistry & molecular biologyProteinsRNAMolecular Sequence AnnotationMissing dataGene OntologyArtificial intelligencebusinessResearch ArticlePROTEOMICS
researchProduct

Estimating with kernel smoothers the mean of functional data in a finite population setting. A note on variance estimation in presence of partially o…

2014

In the near future, millions of load curves measuring the electricity consumption of French households in small time grids (probably half hours) will be available. All these collected load curves represent a huge amount of information which could be exploited using survey sampling techniques. In particular, the total consumption of a specific cus- tomer group (for example all the customers of an electricity supplier) could be estimated using unequal probability random sampling methods. Unfortunately, data collection may undergo technical problems resulting in missing values. In this paper we study a new estimation method for the mean curve in the presence of missing values which consists in…

FOS: Computer and information sciencesStatistics and ProbabilityPopulationRatio estimatorLinearizationRatio estimator01 natural sciencesSurvey sampling.Horvitz–Thompson estimatorMethodology (stat.ME)010104 statistics & probabilityH\'ajek estimator0502 economics and businessApplied mathematicsMissing valuesHorvitz-Thompson estimator0101 mathematicseducationStatistics - Methodology050205 econometrics MathematicsPointwiseeducation.field_of_study[STAT.ME] Statistics [stat]/Methodology [stat.ME]05 social sciencesNonparametric statisticsEstimator16. Peace & justiceMissing dataFunctional data[ STAT.ME ] Statistics [stat]/Methodology [stat.ME]Kernel (statistics)Statistics Probability and UncertaintyNonparametric estimation[STAT.ME]Statistics [stat]/Methodology [stat.ME]
researchProduct

P-1294 - Utility of the world health organization disability assessment schedule II in schizophrenia

2012

Aim The World Health Organization Disability Assessment Schedule II (WHODAS II) was developed for assessing disability. This study provides data on the validity and utility of the Spanish version of the WHODAS II in a large sample of patients with schizophrenia. Methods The sample included 352 patients with a schizophrenia spectrum disorder. They completed a comprehensive assessment battery including measures of psychopathology, functionality and quality-of-life. A sub-sample of 36 patients was retested after six months to assess its temporal stability. Results Participation in society (6.3%) and Life activities (4.0%) were the domains with the highest percentage of missing data. The intern…

medicine.medical_specialtyIntraclass correlationContext (language use)Missing datamedicine.diseaseMental healthPsychiatry and Mental healthCronbach's alphaSchizophreniaScale (social sciences)medicinePsychiatryPsychologyClinical psychologyPsychopathologyEuropean Psychiatry
researchProduct

Bayesian joint modeling for assessing the progression of chronic kidney disease in children.

2016

Joint models are rich and flexible models for analyzing longitudinal data with nonignorable missing data mechanisms. This article proposes a Bayesian random-effects joint model to assess the evolution of a longitudinal process in terms of a linear mixed-effects model that accounts for heterogeneity between the subjects, serial correlation, and measurement error. Dropout is modeled in terms of a survival model with competing risks and left truncation. The model is applied to data coming from ReVaPIR, a project involving children with chronic kidney disease whose evolution is mainly assessed through longitudinal measurements of glomerular filtration rate.

Statistics and ProbabilityEpidemiologyComputer scienceBayesian probability030232 urology & nephrologyRenal function01 natural sciences010104 statistics & probability03 medical and health sciences0302 clinical medicineHealth Information ManagementStatisticsEconometricsmedicineHumans0101 mathematicsRenal Insufficiency ChronicChildJoint (geology)Dropout (neural networks)Survival analysisAutocorrelationBayes Theoremmedicine.diseaseMissing dataSurvival AnalysisChild PreschoolDisease ProgressionKidney diseaseStatistical methods in medical research
researchProduct

2021

Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…

MultidisciplinaryData collectionComputer scienceNode (networking)media_common.quotation_subjectLaw enforcementDeceptionMissing datacomputer.software_genreCriminal investigationEuclidean distanceCovertTerrorismAdjacency listGraph (abstract data type)Data miningcomputermedia_commonPLOS ONE
researchProduct

deaR-Shiny: An Interactive Web App for Data Envelopment Analysis

2021

In this paper, we describe an interactive web application (deaR-shiny) to measure efficiency and productivity using data envelopment analysis (DEA). deaR-shiny aims to fill the gap that currently exists in the availability of online DEA software offering practitioners and researchers free access to a very wide variety of DEA models (both conventional and fuzzy models). We illustrate how to use the web app by replicating the main results obtained by Carlucci, Cirà and Coccorese in 2018, who investigate the efficiency and economic sustainability of Italian regional airport by using two conventional DEA models, and the results given by Kao and Liu in their papers published in 2000 and 2003, wh…

fuzzy deaOperations researchComputer scienceGeography Planning and Development0211 other engineering and technologiesTJ807-83002 engineering and technologyManagement Monitoring Policy and LawTD194-195Fuzzy logic:CIENCIAS ECONÓMICAS [UNESCO]R softwareRenewable energy sourcesmalmquist indexSoftwareMalmquist indexDEA0202 electrical engineering electronic engineering information engineeringData envelopment analysisFuzzy numberWeb applicationGE1-350fuzzy DEAMeasure (data warehouse)021103 operations researchEnvironmental effects of industries and plantsRenewable Energy Sustainability and the Environmentbusiness.industryshinydear packageUNESCO::CIENCIAS ECONÓMICASMissing dataVariety (cybernetics)Environmental sciencesdeaefficiency020201 artificial intelligence & image processingdata envelopment analysisdeaR packagebusinessr softwareSustainability
researchProduct

2013

Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequent…

Binary responseSample size determinationStatisticsExpectation–maximization algorithmEconometricsMain effectImputation (statistics)Missing dataInteractionLogistic regressionMathematicsOpen Journal of Statistics
researchProduct

Model averaging estimation of generalized linear models with imputed covariates

2015

a b s t r a c t We address the problem of estimating generalized linear models when some covariate values are missing but imputations are available to fill-in the missing values. This situation generates a bias-precision trade- off in the estimation of the model parameters. Extending the generalized missing-indicator method proposed by Dardanoni et al. (2011) for linear regression, we handle this trade-off as a problem of model uncertainty using Bayesian averaging of classical maximum likelihood estimators (BAML). We also propose a block model averaging strategy that incorporates information on the missing-data patterns and is computationally simple. An empirical application illustrates our…

Generalized linear modelEconomics and EconometricsApplied MathematicsSettore SECS-P/05 - EconometriaEstimatorMissing dataGeneralized linear mixed modelModel averaging Bayesian averaging of maximum likelihood destimators Generalized linear models Missing covariates Generalized missing-indicator method shareHierarchical generalized linear modelStatisticsLinear regressionCovariateApplied mathematicsGeneralized estimating equationMathematics
researchProduct