Search results for "IMPUTATION"
showing 7 items of 57 documents
Examining facial emotion recognition as an intermediate phenotype for psychosis: Findings from the EUGEI study
2022
The EUGEI project was supported by the European Community’s Seventh Framework Program under grant agreement No. HEALTH-F2- 2009-241909 (Project EU-GEI). Dr. Arango was supported by the Spanish Ministry of Science and Innovation; Instituto de Salud Carlos III (SAM16-PE07CP1, PI16/02012, PI19/024); CIBERSAM (...)
Imputation Procedures in Surveys Using Nonparametric and Machine Learning Methods: An Empirical Comparison
2020
Abstract Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse, nonparametric and machine learning procedures may thus provide a useful alternative to traditional imputation procedures for deriving a set of imputed values used next for the estimation of study parameters defined as solution of population estimating equation. In this paper, we conduct an extensive empirical investigation that compares a number of imputation procedures in terms of bias and efficiency in a wide variety of settings, including high-dimens…
Polygenic Risk Scores and Physical Activity
2020
Supplemental digital content is available in the text.
CLUSTERING INCOMPLETE SPECTRAL DATA WITH ROBUST METHODS
2018
Abstract. Missing value imputation is a common approach for preprocessing incomplete data sets. In case of data clustering, imputation methods may cause unexpected bias because they may change the underlying structure of the data. In order to avoid prior imputation of missing values the computational operations must be projected on the available data values. In this paper, we apply a robust nan-K-spatmed algorithm to the clustering problem on hyperspectral image data. Robust statistics, such as multivariate medians, are more insensitive to outliers than classical statistics relying on the Gaussian assumptions. They are, however, computationally more intractable due to the lack of closed-for…
A low-frequency haplotype spanning SLX4/FANCP constitutes a new risk locus for early-onset breast cancer (<60 years) and is associated with reduce…
2017
Only a fraction of breast cancer (BC) cases can be yet explained by mutations in genes or genomic variants discovered in linkage, genome-wide association and sequencing studies. The known genes entailing medium or high risk for BC are strongly enriched for a function in DNA double strand repair. Thus, aiming at identifying low frequency variants conferring an intermediate risk, we here investigated 17 variants (MAF: 0.01-0.1) in 10 candidate genes involved in DNA repair or cell cycle control. In an exploration cohort of 437 cases and 1189 controls, we show the variant rs3810813 in the SLX4/FANCP gene to be significantly associated with both BC (≤60 years; OR = 2.6(1.6-3.9), p = 1.6E-05) and…
The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based metho…
2017
We analyse new genomic data (0.05–2.95x) from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200–3500 BC) to the Middle Bronze Age (1740–1430 BC) and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, t…
L'imputazione dei dati mancanti: l'effetto sui parametri di un Extended Logistic Rasch Model
2008
Il problema dei dati mancanti è abbastanza comune nella ricerca empirica, specialmente nelle scienze sociali in cui il tentativo di misurazione di quantità non direttamente osservabili (variabili latenti)avviene attraverso la somministrazione di test o questionari costituiti da più item. I modelli statistici finalizzati alla soluzione di tale problema richiedono, in genere, un elevato numero di osservazioni per ogni unità coinvolta nell’analisi. In un contesto multivariato il problema si amplifica, poiché nel modello sono considerati più item per ciascuna osservazione: la probabilità, quindi, di avere almeno un dato mancante non è irrilevante ed è, inoltre, crescente al crescere del numero …