Search results for "missing data"

showing 10 items of 83 documents

Criminal networks analysis in missing data scenarios through graph distances

2021

Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…

Euclidean distanceData collectionComputer scienceNode (networking)Law enforcementGraph (abstract data type)Adjacency listData miningMissing datacomputer.software_genreCriminal investigationcomputerCrimRxiv
researchProduct

Correcting for non-ignorable missingness in smoking trends

2015

Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972-1997. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The u…

Statistics and ProbabilityBackground informationFOS: Computer and information sciencesta112Test data generationComputer scienceSurvey samplingnon-participationta3142Smoking prevalenceBayesian inferenceMissing dataStatistics - Applicationsregistry dataMethodology (stat.ME)missing dataStatisticsSurvey data collectionRegistry dataApplications (stat.AP)Statistics Probability and Uncertaintysurvey samplingStatistics - Methodologysmoking prevalencehealth examination survey
researchProduct

Domestic load forecasting using neural network and its use for missing data analysis

2015

Domestic demand prediction is very important for home energy management system and also for peak reduction in power system network. In this work, active and reactive power consumption prediction model is developed and analysed for a typical Southern Norwegian house for hourly power (active and reactive) consumptions and time information as inputs. In the proposed model, a neural network is adopted as a main technique and historical domestic load data of around 2 years are used as input. The available data has some measurement errors and missing segments. Before using the data for training purpose, missing and inaccurate data are considered and then it is used for testing the model. It is ob…

Energy management systemEngineeringElectric power systemObservational errorArtificial neural networkOperations researchbusiness.industryDistribution management systemAC powerMissing databusinessReliability engineeringPower (physics)
researchProduct

Selection bias was reduced by recontacting nonparticipants

2016

Objective One of the main goals of health examination surveys is to provide unbiased estimates of health indicators at the population level. We demonstrate how multiple imputation methods may help to reduce the selection bias if partial data on some nonparticipants are collected. Study Design and Setting In the FINRISK 2007 study, a population-based health study conducted in Finland, a random sample of 10,000 men and women aged 25–74 years were invited to participate. The study included a questionnaire data collection and a health examination. A total of 6,255 individuals participated in the study. Out of 3,745 nonparticipants, 473 returned a simplified questionnaire after a recontact. Both…

Research designAdultMaleBiomedical Researchbiasmultiple imputationEpidemiologyCross-sectional studymedia_common.quotation_subjectPopulation01 natural sciencesProxy (climate)010104 statistics & probability03 medical and health sciencesmissing data0302 clinical medicinenon-responseStatisticsHumanssurvey030212 general & internal medicine0101 mathematicseducationFinlandSelection Biasmedia_commonAgedResponse rate (survey)Selection biasAged 80 and overeducation.field_of_studyta112Patient Selectionta3142Middle AgedMissing dataHealth indicatorCross-Sectional StudiesResearch DesignFemalePsychologyDemographyFollow-Up Studies
researchProduct

Multiple Comparisons of Treatments with Stable Multivariate Tests in a Two‐Stage Adaptive Design, Including a Test for Non‐Inferiority

2000

The application of stabilized multivariate tests is demonstrated in the analysis of a two-stage adaptive clinical trial with three treatment arms. Due to the clinical problem, the multiple comparisons include tests of superiority as well as a test for non-inferiority, where non-inferiority is (because of missing absolute tolerance limits) expressed as linear contrast of the three treatments. Special emphasis is paid to the combination of the three sources of multiplicity - multiple endpoints, multiple treatments, and two stages of the adaptive design. Particularly, the adaptation after the first stage comprises a change of the a-priori order of hypotheses.

Statistics and ProbabilityMultivariate statisticsAdaptive clinical trialMultivariate analysisMultiple comparisons problemStatisticsContrast (statistics)Regression analysisGeneral MedicineStatistics Probability and UncertaintyMissing dataStatistical hypothesis testingMathematicsBiometrical Journal
researchProduct

Missing values in deduplication of electronic patient data

2011

Data deduplication refers to the process in which records referring to the same real-world entities are detected in datasets such that duplicated records can be eliminated. The denotation ‘record linkage’ is used here for the same problem.1 A typical application is the deduplication of medical registry data.2 3 Medical registries are institutions that collect medical and personal data in a standardized and comprehensive way. The primary aims are the creation of a pool of patients eligible for clinical or epidemiological studies and the computation of certain indices such as the incidence in order to oversee the development of diseases. The latter task in particular requires a database in wh…

Computer sciencemedia_common.quotation_subjectInferenceHealth InformaticsAmbiguityPatient dataMissing datacomputer.software_genreResearch and ApplicationsRegressionNeoplasmsStatisticsData deduplicationElectronic Health RecordsHumansData miningImputation (statistics)Medical Record LinkageRegistriescomputerRecord linkagemedia_common
researchProduct

A new methodology for Functional Principal Component Analysis from scarce data. Application to stroke rehabilitation.

2015

Functional Principal Component Analysis (FPCA) is an increasingly used methodology for analysis of biomedical data. This methodology aims to obtain Functional Principal Components (FPCs) from Functional Data (time dependent functions). However, in biomedical data, the most common scenario of this analysis is from discrete time values. Standard procedures for FPCA require obtaining the functional data from these discrete values before extracting the FPCs. The problem appears when there are missing values in a non-negligible sample of subjects, especially at the beginning or the end of the study, because this approach can compromise the analysis due to the need to extrapolate or dismiss subje…

Scarce dataFunctional principal component analysisPrincipal Component AnalysisComputer scienceProcess (engineering)Stroke RehabilitationSample (statistics)Missing datacomputer.software_genreStrokePrincipal component analysisHumansData miningcomputerAnnual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
researchProduct

Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

2017

Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…

Statistics and ProbabilityComputer scienceComputationDimensionality reductionIncremental methods02 engineering and technologyMissing data01 natural sciences010104 statistics & probabilityData explosionStreaming dataPrincipal component analysis0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0101 mathematicsStatistics Probability and UncertaintyAlgorithmEigendecomposition of a matrixInternational Statistical Review
researchProduct

Imputation Strategies for Missing Data in Environmental Time Series for An Unlucky Situation

2005

After a detailed review of the main specific solutions for treatment of missing data in environmental time series, this paper deals with the unlucky situation in which, in an hourly series, missing data immediately follow an absolutely anomalous period, for which we do not have any similar period to use for imputation. A tentative multivariate and multiple imputation is put forward and evaluated; it is based on the possibility, typical of environmental time series, to resort to correlations or physical laws that characterize relationships between air pollutants.

Multivariate statisticsAir pollutantsComputer scienceStatisticsAutoregressive–moving-average modelImputation (statistics)Missing data
researchProduct

Study Design in Causal Models

2014

The causal assumptions, the study design and the data are the elements required for scientific inference in empirical research. The research is adequately communicated only if all of these elements and their relations are described precisely. Causal models with design describe the study design and the missing-data mechanism together with the causal structure and allow the direct application of causal calculus in the estimation of the causal effects. The flow of the study is visualized by ordering the nodes of the causal diagram in two dimensions by their causal order and the time of the observation. Conclusions on whether a causal or observational relationship can be estimated from the coll…

Statistics and ProbabilityEmpirical researchTheoretical computer scienceGraph (abstract data type)Graphical modelStatistics Probability and UncertaintyCausal structureMissing dataCausalityStructural equation modelingCausal modelMathematicsScandinavian Journal of Statistics
researchProduct