Search results for "IMPUTATION"
showing 10 items of 57 documents
Interpretable machine learning models for single-cell ChIP-seq imputation
2019
AbstractMotivationSingle-cell ChIP-seq (scChIP-seq) analysis is challenging due to data sparsity. High degree of data sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from ENCODE to impute missing protein-DNA interacting regions of target histone marks or transcription factors.ResultsImputations using machine learning models trained for each single cell, each target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene i…
Regression imputation for Space-Time datasets with missing values
2009
Data consisting in repeated observations on a series of fixed units are very common in different context like biological, environmental and social sciences, and different terminology is often used to indicate this kind of data: panel data, longitudinal data, time series-cross section data (TSCS), spatio-temporal data. Missing information are inevitable in longitudinal studies, and can produce biased estimates and loss of powers. The aim of this paper is to propose a new regression (single) imputation method that, considering the particular structure and characteristics of the data set, creates a “complete” data set that can be analyzed by any researcher on different occasions and using diff…
Genome-Wide Haplotype Analysis of Cis Expression Quantitative Trait Loci in Monocytes
2013
In order to assess whether gene expression variability could be influenced by several SNPs acting in cis, either through additive or more complex haplotype effects, a systematic genome-wide search for cis haplotype expression quantitative trait loci (eQTL) was conducted in a sample of 758 individuals, part of the Cardiogenics Transcriptomic Study, for which genome-wide monocyte expression and GWAS data were available. 19,805 RNA probes were assessed for cis haplotypic regulation through investigation of ∼2,1×109 haplotypic combinations. 2,650 probes demonstrated haplotypic p-values >104-fold smaller than the best single SNP p-value. Replication of significant haplotype effects were tested f…
Air quality and integration of short-term and long-term pollutant data
2008
Modelling PM10 is an important problem in statistical methodology, above all to explain the PM10 behaviour in space and time, since it has been linked to many adverse effects on human and environmental health. But the large spatial variability of the main traffic-related pollutants, and in particular here the PM10, implies the impossibility of obtaining from the data of the fixed stations a complete pictures of the atmospheric pollution in the urban areas. Information from fixed monitoring stations (long-term measurements) are therefore integrated with the ones deriving from mobile station (short-term measurements). Short-term measurements are incomplete and so it is necessary to integrate …
2015
Hearing loss and individual differences in normal hearing both have a substantial genetic basis. Although many new genes contributing to deafness have been identified, very little is known about genes/variants modulating the normal range of hearing ability. To fill this gap, we performed a two-stage meta-analysis on hearing thresholds (tested at 0.25, 0.5, 1, 2, 4, 8 kHz) and on pure-tone averages (low-, medium- and high-frequency thresholds grouped) in several isolated populations from Italy and Central Asia (total N = 2636). Here, we detected two genome-wide significant loci close to PCDH20 and SLC28A3 (top hits: rs78043697, P = 4.71E-10 and rs7032430, P = 2.39E-09, respectively). For bot…
Bayesian models for data missing not at random in health examination surveys
2018
In epidemiological surveys, data missing not at random (MNAR) due to survey nonresponse may potentially lead to a bias in the risk factor estimates. We propose an approach based on Bayesian data augmentation and survival modelling to reduce the nonresponse bias. The approach requires additional information based on follow-up data. We present a case study of smoking prevalence using FINRISK data collected between 1972 and 2007 with a follow-up to the end of 2012 and compare it to other commonly applied missing at random (MAR) imputation approaches. A simulation experiment is carried out to study the validity of the approaches. Our approach appears to reduce the nonresponse bias substantially…
Item nonresponse and imputation strategies in SHARE Wave 5
2015
This chapter focuses on item nonresponse in the fifth wave of SHARE and the imputation strategies adopted to fill-in the missing values.
Identification of patterns og change on mongitudinal data, illustrated by two exemples : study of hospital pathways in the management of cancer. Cons…
2014
Context In healthcare domain, data mining for knowledge discovery represent a growing issue. Questions about the organisation of healthcare system and the study of the relation between treatment and quality of life (QoL) perceived could be addressed that way. The evolution of technologies provides us with efficient data mining tools and statistical packages containing advanced methods available for non-experts. We illustrate this approach through two issues: 1 / What organisation of healthcare system for cancer diseases management? 2 / Exploring in patients suffering from metastatic cancer, the relationship between health-related QoL perceived and treatment received as part of a clinical tr…
Comparison of HapMap and 1000 genomes reference panels in a large-scale genome-wide association study
2017
An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were ass…
Genome-wide Analyses Identify KIF5A as a Novel ALS Gene
2018
© 2018 Elsevier Inc.