Search results for "Missing data"
showing 10 items of 83 documents
Empirical Orthogonal Function and Functional Data Analysis Procedures to Impute Long Gaps in Environmental Data
2016
Air pollution data sets are usually spatio-temporal multivariate data related to time series of different pollutants recorded by a monitoring network. To improve the estimate of functional data when missing values, and mainly long gaps, are present in the original data set, some procedures are here proposed considering jointly Functional Data Analysis and Empirical Orthogonal Function approaches. In order to compare and validate the proposed procedures, a simulation plan is carried out and some performance indicators are computed. The obtained results show that one of the proposed procedures works better than the others, providing a better reconstruction especially in presence of long gaps.
Application of multivariate statistics to the problems of upper palaeolithic and mesolithic samples
1987
Multivariate statistics (discriminant function analysis and principal component analysis) have been applied to a broad sample of Upper Paleolithic and mesolithic skulls. In addition to some methodological problems concerning the evaluation of missing data by principal component analysis, we discussed the possibility of misclassifications (14%).
A robust evolutionary algorithm for the recovery of rational Gielis curves
2013
International audience; Gielis curves (GC) can represent a wide range of shapes and patterns ranging from star shapes to symmetric and asymmetric polygons, and even self intersecting curves. Such patterns appear in natural objects or phenomena, such as flowers, crystals, pollen structures, animals, or even wave propagation. Gielis curves and surfaces are an extension of Lamé curves and surfaces (superquadrics) which have benefited in the last two decades of extensive researches to retrieve their parameters from various data types, such as range images, 2D and 3D point clouds, etc. Unfortunately, the most efficient techniques for superquadrics recovery, based on deterministic methods, cannot…
Deep Learning and Cultural Heritage: The CEPROQHA Project Case Study
2019
Cultural heritage takes an important part of the history of humankind as it is one of the most powerful tools for the transfer and preservation of moral identity. As a result, these cultural assets are considered highly valuable and sometimes priceless. Digital technologies provided multiple tools that address challenges related to the promotion and information access in the cultural context. However, the large data collections of cultural information have more potential to add value and address current challenges in this context with the recent progress in artificial intelligence (AI) with deep learning and data mining tools. Through the present paper, we investigate several approaches tha…
Using Deep Learning to Extrapolate Protein Expression Measurements
2020
Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, in…
Missing value imputation in proximity extension assay-based targeted proteomics data
2020
Targeted proteomics utilizing antibody-based proximity extension assays provides sensitive and highly specific quantifications of plasma protein levels. Multivariate analysis of this data is hampered by frequent missing values (random or left censored), calling for imputation approaches. While appropriate missing-value imputation methods exist, benchmarks of their performance in targeted proteomics data are lacking. Here, we assessed the performance of two methods for imputation of values missing completely at random, the previously top-benchmarked ‘missForest’ and the recently published ‘GSimp’ method. Evaluation was accomplished by comparing imputed with remeasured relative concentrations…
Evolutionary Spectrum for Random Field and Missing Observations
2012
There are innumerable situations where the data observed from a non-stationary random field are collected with missing values. In this work a consistent estimate of the evolutionary spectral density is given where some observations are randomly missing.
Selection bias was reduced by recontacting nonparticipants
2016
Objective One of the main goals of health examination surveys is to provide unbiased estimates of health indicators at the population level. We demonstrate how multiple imputation methods may help to reduce the selection bias if partial data on some nonparticipants are collected. Study Design and Setting In the FINRISK 2007 study, a population-based health study conducted in Finland, a random sample of 10,000 men and women aged 25–74 years were invited to participate. The study included a questionnaire data collection and a health examination. A total of 6,255 individuals participated in the study. Out of 3,745 nonparticipants, 473 returned a simplified questionnaire after a recontact. Both…
A new methodology for Functional Principal Component Analysis from scarce data. Application to stroke rehabilitation.
2015
Functional Principal Component Analysis (FPCA) is an increasingly used methodology for analysis of biomedical data. This methodology aims to obtain Functional Principal Components (FPCs) from Functional Data (time dependent functions). However, in biomedical data, the most common scenario of this analysis is from discrete time values. Standard procedures for FPCA require obtaining the functional data from these discrete values before extracting the FPCs. The problem appears when there are missing values in a non-negligible sample of subjects, especially at the beginning or the end of the study, because this approach can compromise the analysis due to the need to extrapolate or dismiss subje…
Regression with Imputed Covariates: A Generalized Missing Indicator Approach
2011
A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may then…