Search results for "missing data"
showing 10 items of 83 documents
Criminal networks analysis in missing data scenarios through graph distances
2021
Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…
Correcting for non-ignorable missingness in smoking trends
2015
Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972-1997. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The u…
Domestic load forecasting using neural network and its use for missing data analysis
2015
Domestic demand prediction is very important for home energy management system and also for peak reduction in power system network. In this work, active and reactive power consumption prediction model is developed and analysed for a typical Southern Norwegian house for hourly power (active and reactive) consumptions and time information as inputs. In the proposed model, a neural network is adopted as a main technique and historical domestic load data of around 2 years are used as input. The available data has some measurement errors and missing segments. Before using the data for training purpose, missing and inaccurate data are considered and then it is used for testing the model. It is ob…
Selection bias was reduced by recontacting nonparticipants
2016
Objective One of the main goals of health examination surveys is to provide unbiased estimates of health indicators at the population level. We demonstrate how multiple imputation methods may help to reduce the selection bias if partial data on some nonparticipants are collected. Study Design and Setting In the FINRISK 2007 study, a population-based health study conducted in Finland, a random sample of 10,000 men and women aged 25–74 years were invited to participate. The study included a questionnaire data collection and a health examination. A total of 6,255 individuals participated in the study. Out of 3,745 nonparticipants, 473 returned a simplified questionnaire after a recontact. Both…
Multiple Comparisons of Treatments with Stable Multivariate Tests in a Two‐Stage Adaptive Design, Including a Test for Non‐Inferiority
2000
The application of stabilized multivariate tests is demonstrated in the analysis of a two-stage adaptive clinical trial with three treatment arms. Due to the clinical problem, the multiple comparisons include tests of superiority as well as a test for non-inferiority, where non-inferiority is (because of missing absolute tolerance limits) expressed as linear contrast of the three treatments. Special emphasis is paid to the combination of the three sources of multiplicity - multiple endpoints, multiple treatments, and two stages of the adaptive design. Particularly, the adaptation after the first stage comprises a change of the a-priori order of hypotheses.
Missing values in deduplication of electronic patient data
2011
Data deduplication refers to the process in which records referring to the same real-world entities are detected in datasets such that duplicated records can be eliminated. The denotation ‘record linkage’ is used here for the same problem.1 A typical application is the deduplication of medical registry data.2 3 Medical registries are institutions that collect medical and personal data in a standardized and comprehensive way. The primary aims are the creation of a pool of patients eligible for clinical or epidemiological studies and the computation of certain indices such as the incidence in order to oversee the development of diseases. The latter task in particular requires a database in wh…
A new methodology for Functional Principal Component Analysis from scarce data. Application to stroke rehabilitation.
2015
Functional Principal Component Analysis (FPCA) is an increasingly used methodology for analysis of biomedical data. This methodology aims to obtain Functional Principal Components (FPCs) from Functional Data (time dependent functions). However, in biomedical data, the most common scenario of this analysis is from discrete time values. Standard procedures for FPCA require obtaining the functional data from these discrete values before extracting the FPCs. The problem appears when there are missing values in a non-negligible sample of subjects, especially at the beginning or the end of the study, because this approach can compromise the analysis due to the need to extrapolate or dismiss subje…
Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
2017
Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…
Imputation Strategies for Missing Data in Environmental Time Series for An Unlucky Situation
2005
After a detailed review of the main specific solutions for treatment of missing data in environmental time series, this paper deals with the unlucky situation in which, in an hourly series, missing data immediately follow an absolutely anomalous period, for which we do not have any similar period to use for imputation. A tentative multivariate and multiple imputation is put forward and evaluated; it is based on the possibility, typical of environmental time series, to resort to correlations or physical laws that characterize relationships between air pollutants.
Study Design in Causal Models
2014
The causal assumptions, the study design and the data are the elements required for scientific inference in empirical research. The research is adequately communicated only if all of these elements and their relations are described precisely. Causal models with design describe the study design and the missing-data mechanism together with the causal structure and allow the direct application of causal calculus in the estimation of the causal effects. The flow of the study is visualized by ordering the nodes of the causal diagram in two dimensions by their causal order and the time of the observation. Conclusions on whether a causal or observational relationship can be estimated from the coll…