Search results for "Missing data"

showing 10 items of 83 documents

Regression imputation for Space-Time datasets with missing values

2009

Data consisting in repeated observations on a series of fixed units are very common in different context like biological, environmental and social sciences, and different terminology is often used to indicate this kind of data: panel data, longitudinal data, time series-cross section data (TSCS), spatio-temporal data. Missing information are inevitable in longitudinal studies, and can produce biased estimates and loss of powers. The aim of this paper is to propose a new regression (single) imputation method that, considering the particular structure and characteristics of the data set, creates a “complete” data set that can be analyzed by any researcher on different occasions and using diff…

Cross-sectional dataSpace timeMissing datacomputer.software_genreRegressionTerminologyGeographyStatisticsSpace-time data imputationPerformance indicatorImputation (statistics)Data miningSettore SECS-S/01 - StatisticacomputerPanel data
researchProduct

Single imputation method of missing values in environmental pollution data sets

2006

Abstract Missing data represent a general problem in many scientific fields above all in environmental research. Several methods have been proposed in literature for handling missing data and the choice of an appropriate method depends, among others, on the missing data pattern and on the missing-data mechanism. One approach to the problem is to impute them to yield a complete data set. The goal of this paper is to propose a new single imputation method and to compare its performance to other single and multiple imputation methods known in literature. Considering a data set of PM 10 concentration measured every 2 h by eight monitoring stations distributed over the metropolitan area of Paler…

Data setAtmospheric ScienceCorrelation coefficientStatisticsEnvironmental pollutionImputation (statistics)Performance indicatorTime seriesMissing dataRoot-mean-square deviationGeneral Environmental ScienceMathematicsAtmospheric Environment
researchProduct

Latent force models for earth observation time series prediction

2016

We introduce latent force models for Earth observation time series analysis. The model uses Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. The LFM presented here performs multi-output structured regression, adapts to the signal characteristics, it can cope with missing data in the time series, and provides explicit latent functions that allow system analysis and evaluation. We successfully illustrate the performance in challenging scenarios of crop monitoring from space, providing time-resolved time series predictions.

Earth observation010504 meteorology & atmospheric sciencesSeries (mathematics)Differential equationComputer scienceMatemáticas02 engineering and technologyMissing data01 natural sciencesData-drivenData modelingsymbols.namesake0202 electrical engineering electronic engineering information engineeringsymbols020201 artificial intelligence & image processingGeologíaTime seriesGaussian processAlgorithmSimulation0105 earth and related environmental sciences
researchProduct

Regression with imputed covariates: A generalized missing-indicator approach

2011

A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper, we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may the…

Economics and EconometricsApplied MathematicsRegression analysisMissing dataRegressionSet (abstract data type)Reduction (complexity)Economic dataBias of an estimatorStatisticsCovariateMissing covariates ImputationsBias precision trade-off Model reduction Model averaging BMI and incomeEconometricsStatistics::MethodologyC12C13C19Missing covariatesImputationsBias-precision trade-offModel reductionModel averagingBMI and incomeMathematics
researchProduct

Domestic load forecasting using neural network and its use for missing data analysis

2015

Domestic demand prediction is very important for home energy management system and also for peak reduction in power system network. In this work, active and reactive power consumption prediction model is developed and analysed for a typical Southern Norwegian house for hourly power (active and reactive) consumptions and time information as inputs. In the proposed model, a neural network is adopted as a main technique and historical domestic load data of around 2 years are used as input. The available data has some measurement errors and missing segments. Before using the data for training purpose, missing and inaccurate data are considered and then it is used for testing the model. It is ob…

Energy management systemEngineeringElectric power systemObservational errorArtificial neural networkOperations researchbusiness.industryDistribution management systemAC powerMissing databusinessReliability engineeringPower (physics)
researchProduct

DATimeS: A machine learning time series GUI toolbox for gap-filling and vegetation phenology trends detection

2020

Abstract Optical remotely sensed data are typically discontinuous, with missing values due to cloud cover. Consequently, gap-filling solutions are needed for accurate crop phenology characterization. The here presented Decomposition and Analysis of Time Series software (DATimeS) expands established time series interpolation methods with a diversity of advanced machine learning fitting algorithms (e.g., Gaussian Process Regression: GPR) particularly effective for the reconstruction of multiple-seasons vegetation temporal patterns. DATimeS is freely available as a powerful image time series software that generates cloud-free composite maps and captures seasonal vegetation dynamics from regula…

Environmental Engineering010504 meteorology & atmospheric sciencesComputer science0211 other engineering and technologies02 engineering and technologyMachine learningcomputer.software_genre01 natural sciencesArticleSoftwareKrigingTime seriesLeaf area index021101 geological & geomatics engineering0105 earth and related environmental sciencesSeries (mathematics)business.industryEcological ModelingVegetation15. Life on landMissing dataArtificial intelligencebusinesscomputerSoftwareInterpolationEnvironmental Modelling & Software
researchProduct

The Association Between Epigenetic Clocks and Physical Functioning in Older Women: A 3-Year Follow-up

2021

Abstract Background Epigenetic clocks are composite markers developed to predict chronological age or mortality risk from DNA methylation (DNAm) data. The present study investigated the associations between 4 epigenetic clocks (Horvath’s and Hannum’s DNAmAge and DNAm GrimAge and PhenoAge) and physical functioning during a 3-year follow-up. Method We studied 63- to 76-year-old women (N = 413) from the Finnish Twin Study on Aging. DNAm was measured from blood samples at baseline. Age acceleration (AgeAccel), that is, discrepancy between chronological age and DNAm age, was determined as residuals from linear model. Physical functioning was assessed under standardized laboratory conditions at b…

EpigenomicsAgingfyysinen toimintakykyEpigenesis Genetic03 medical and health sciences0302 clinical medicinePhysical functioningMedicineHumans030212 general & internal medicineEpigeneticsAssociation (psychology)030304 developmental biology0303 health sciencesbusiness.industryLinear modelRepeated measures designdNaMDNA MethylationMissing dataTwin studyDNA-metylaatioikääntyminenCross-Sectional Studiesepigenetiikkabiological aging3121 General medicine internal medicine and other clinical medicineFemaleGeriatrics and Gerontologybusinessepigenetic clockDemographyFollow-Up Studies
researchProduct

Criminal networks analysis in missing data scenarios through graph distances

2021

Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…

Euclidean distanceData collectionComputer scienceNode (networking)Law enforcementGraph (abstract data type)Adjacency listData miningMissing datacomputer.software_genreCriminal investigationcomputerCrimRxiv
researchProduct

Gap Filling of Biophysical Parameter Time Series with Multi-Output Gaussian Processes

2018

In this work we evaluate multi-output (MO) Gaussian Process (GP) models based on the linear model of coregionalization (LMC) for estimation of biophysical parameter variables under a gap filling setup. In particular, we focus on LAI and fAPAR over rice areas. We show how this problem cannot be solved with standard single-output (SO) GP models, and how the proposed MO-GP models are able to successfully predict these variables even in high missing data regimes, by implicitly performing an across-domain information transfer.

FOS: Computer and information sciencesComputer Science - Machine Learning010504 meteorology & atmospheric sciences0211 other engineering and technologiesFOS: Physical sciencesMachine Learning (stat.ML)02 engineering and technology01 natural sciencesQuantitative Biology - Quantitative MethodsMachine Learning (cs.LG)Data modelingsymbols.namesakeStatistics - Machine LearningApplied mathematicsTime seriesGaussian processQuantitative Methods (q-bio.QM)021101 geological & geomatics engineering0105 earth and related environmental sciencesMathematicsSeries (mathematics)Linear modelProbability and statisticsMissing dataFOS: Biological sciencesPhysics - Data Analysis Statistics and ProbabilitysymbolsFocus (optics)Data Analysis Statistics and Probability (physics.data-an)
researchProduct

Do-search -- a tool for causal inference and study design with multiple data sources

2020

Epidemiologic evidence is based on multiple data sources including clinical trials, cohort studies, surveys, registries, and expert opinions. Merging information from different sources opens up new possibilities for the estimation of causal effects. We show how causal effects can be identified and estimated by combining experiments and observations in real and realistic scenarios. As a new tool, we present do-search, a recently developed algorithmic approach that can determine the identifiability of a causal effect. The approach is based on do-calculus, and it can utilize data with nontrivial missing data and selection bias mechanisms. When the effect is identifiable, do-search outputs an i…

FOS: Computer and information sciencesEpidemiologyComputer sciencemedia_common.quotation_subjectInformation Storage and RetrievalMachine learningcomputer.software_genre01 natural sciencesStatistics - ApplicationsMethodology (stat.ME)010104 statistics & probability03 medical and health sciences0302 clinical medicineHumansApplications (stat.AP)030212 general & internal medicine0101 mathematicsSalt intakeStatistics - Methodologymedia_commonSelection biasbusiness.industryNutrition SurveysMissing dataCausalityCausalityResearch DesignCausal inferenceMeta-analysisSurvey data collectionIdentifiabilityArtificial intelligencebusinesscomputer
researchProduct