Search results for "Missing data"

showing 10 items of 83 documents

Empirical Orthogonal Function and Functional Data Analysis Procedures to Impute Long Gaps in Environmental Data

2016

Air pollution data sets are usually spatio-temporal multivariate data related to time series of different pollutants recorded by a monitoring network. To improve the estimate of functional data when missing values, and mainly long gaps, are present in the original data set, some procedures are here proposed considering jointly Functional Data Analysis and Empirical Orthogonal Function approaches. In order to compare and validate the proposed procedures, a simulation plan is carried out and some performance indicators are computed. The obtained results show that one of the proposed procedures works better than the others, providing a better reconstruction especially in presence of long gaps.

Multivariate statisticsComputer scienceFunctional data analysisEmpirical orthogonal functionsMissing datacomputer.software_genreEnvironmental dataEOF FDA Missing data Environmental dataSet (abstract data type)Singular value decompositionPerformance indicatorData miningSettore SECS-S/01 - Statisticacomputer

researchProduct

Application of multivariate statistics to the problems of upper palaeolithic and mesolithic samples

1987

Multivariate statistics (discriminant function analysis and principal component analysis) have been applied to a broad sample of Upper Paleolithic and mesolithic skulls. In addition to some methodological problems concerning the evaluation of missing data by principal component analysis, we discussed the possibility of misclassifications (14%).

Multivariate statisticsGeographyDiscriminant function analysisAnthropologyStatisticsPrincipal component analysisUpper PaleolithicSample (statistics)Missing dataMesolithicHuman Evolution

researchProduct

A robust evolutionary algorithm for the recovery of rational Gielis curves

2013

International audience; Gielis curves (GC) can represent a wide range of shapes and patterns ranging from star shapes to symmetric and asymmetric polygons, and even self intersecting curves. Such patterns appear in natural objects or phenomena, such as flowers, crystals, pollen structures, animals, or even wave propagation. Gielis curves and surfaces are an extension of Lamé curves and surfaces (superquadrics) which have benefited in the last two decades of extensive researches to retrieve their parameters from various data types, such as range images, 2D and 3D point clouds, etc. Unfortunately, the most efficient techniques for superquadrics recovery, based on deterministic methods, cannot…

OptimizationEvolutionary algorithmInitializationR-functions02 engineering and technology[ INFO.INFO-CV ] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]Artificial IntelligenceRobustness (computer science)Evolutionary algorithmSuperquadricsGielis curves0202 electrical engineering electronic engineering information engineeringBiologyMathematicsComputer. AutomationSuperquadrics[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]020207 software engineeringMissing dataEuclidean distanceMaxima and minimaSignal Processing020201 artificial intelligence & image processingComputer Vision and Pattern RecognitionGradient descentAlgorithmEngineering sciences. TechnologySoftwarePattern recognition

researchProduct

Deep Learning and Cultural Heritage: The CEPROQHA Project Case Study

2019

Cultural heritage takes an important part of the history of humankind as it is one of the most powerful tools for the transfer and preservation of moral identity. As a result, these cultural assets are considered highly valuable and sometimes priceless. Digital technologies provided multiple tools that address challenges related to the promotion and information access in the cultural context. However, the large data collections of cultural information have more potential to add value and address current challenges in this context with the recent progress in artificial intelligence (AI) with deep learning and data mining tools. Through the present paper, we investigate several approaches tha…

Progress in artificial intelligenceValue (ethics)Computer sciencebusiness.industryDeep learningmedia_common.quotation_subjectInformation accessContext (language use)Cultural HeritageMissing dataData scienceCultural heritageCEPROQHA ProjectDeep LearningPromotion (rank)Artificial IntelligenceArtificial intelligencebusinessDigital Heritagemedia_common2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)

researchProduct

Using Deep Learning to Extrapolate Protein Expression Measurements

2020

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, in…

ProteomicsIn silicoQuantitative proteomicsComputational biologyBiologyBiochemistryprotein abundance predictionMass SpectrometryProtein expressionMice03 medical and health sciencesDeep LearningAbundance (ecology)AnimalsMolecular BiologyGeneResearch Articles030304 developmental biologydeep learning networks0303 health sciencesUniProt keywordsbusiness.industryDeep learning030302 biochemistry & molecular biologyProteinsRNAMolecular Sequence AnnotationMissing dataGene OntologyArtificial intelligencebusinessResearch ArticlePROTEOMICS

researchProduct

Missing value imputation in proximity extension assay-based targeted proteomics data

2020

Targeted proteomics utilizing antibody-based proximity extension assays provides sensitive and highly specific quantifications of plasma protein levels. Multivariate analysis of this data is hampered by frequent missing values (random or left censored), calling for imputation approaches. While appropriate missing-value imputation methods exist, benchmarks of their performance in targeted proteomics data are lacking. Here, we assessed the performance of two methods for imputation of values missing completely at random, the previously top-benchmarked ‘missForest’ and the recently published ‘GSimp’ method. Evaluation was accomplished by comparing imputed with remeasured relative concentrations…

ProteomicsMaleMultivariate analysisProtein ExpressionBiochemistryProtein expressionDatabase and Informatics MethodsLimit of DetectionStatisticsMedicine and Health SciencesBiochemical SimulationsImputation (statistics)Immune ResponseMathematicsMultidisciplinaryProteomic DatabasesQREukaryotaBlood ProteinsVenous ThromboembolismPlantsMiddle AgedLegumesTargeted proteomicssymbolsEngineering and TechnologyMedicineFemaleAlgorithmsResearch ArticleQuality ControlAdultScienceImmunologyResearch and Analysis Methodssymbols.namesakeSigns and SymptomsBiasIndustrial EngineeringProtein Concentration AssaysGene Expression and Vector TechniquesMissing value imputationHumansMolecular Biology TechniquesMolecular BiologyAgedInflammationMolecular Biology Assays and Analysis TechniquesInterleukin-6OrganismsPeasBiology and Life SciencesComputational BiologyMissing dataPearson product-moment correlation coefficientBiological DatabasesMultivariate AnalysisClinical MedicineVenous thromboembolismPLOS ONE

researchProduct

Evolutionary Spectrum for Random Field and Missing Observations

2012

There are innumerable situations where the data observed from a non-stationary random field are collected with missing values. In this work a consistent estimate of the evolutionary spectral density is given where some observations are randomly missing.

Random fieldSpectrum (functional analysis)StatisticsSpectral densityPeriodogramStatistical physicsMissing dataMathematics

researchProduct

Selection bias was reduced by recontacting nonparticipants

2016

Objective One of the main goals of health examination surveys is to provide unbiased estimates of health indicators at the population level. We demonstrate how multiple imputation methods may help to reduce the selection bias if partial data on some nonparticipants are collected. Study Design and Setting In the FINRISK 2007 study, a population-based health study conducted in Finland, a random sample of 10,000 men and women aged 25–74 years were invited to participate. The study included a questionnaire data collection and a health examination. A total of 6,255 individuals participated in the study. Out of 3,745 nonparticipants, 473 returned a simplified questionnaire after a recontact. Both…

Research designAdultMaleBiomedical Researchbiasmultiple imputationEpidemiologyCross-sectional studymedia_common.quotation_subjectPopulation01 natural sciencesProxy (climate)010104 statistics & probability03 medical and health sciencesmissing data0302 clinical medicinenon-responseStatisticsHumanssurvey030212 general & internal medicine0101 mathematicseducationFinlandSelection Biasmedia_commonAgedResponse rate (survey)Selection biasAged 80 and overeducation.field_of_studyta112Patient Selectionta3142Middle AgedMissing dataHealth indicatorCross-Sectional StudiesResearch DesignFemalePsychologyDemographyFollow-Up Studies

researchProduct

A new methodology for Functional Principal Component Analysis from scarce data. Application to stroke rehabilitation.

2015

Functional Principal Component Analysis (FPCA) is an increasingly used methodology for analysis of biomedical data. This methodology aims to obtain Functional Principal Components (FPCs) from Functional Data (time dependent functions). However, in biomedical data, the most common scenario of this analysis is from discrete time values. Standard procedures for FPCA require obtaining the functional data from these discrete values before extracting the FPCs. The problem appears when there are missing values in a non-negligible sample of subjects, especially at the beginning or the end of the study, because this approach can compromise the analysis due to the need to extrapolate or dismiss subje…

Scarce dataFunctional principal component analysisPrincipal Component AnalysisComputer scienceProcess (engineering)Stroke RehabilitationSample (statistics)Missing datacomputer.software_genreStrokePrincipal component analysisHumansData miningcomputerAnnual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

researchProduct

Regression with Imputed Covariates: A Generalized Missing Indicator Approach

2011

A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may then…

Set (abstract data type)Reduction (complexity)Relation (database)Bias of an estimatorStatisticsCovariateSettore SECS-P/05 - EconometriaStatistics::MethodologyRegression analysisMissing dataRegressionMathematicsSSRN Electronic Journal

researchProduct