6533b828fe1ef96bd128845f
RESEARCH PRODUCT
Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
Stefanie HiekeAxel BennerRichard F. SchlenkMartin SchumacherHarald BinderLars Bullingersubject
0301 basic medicineMultivariate analysisMicroarraysTest StatisticsGene Expressionlcsh:MedicineBioinformatics01 natural sciencesHematologic Cancers and Related DisordersCohort Studies010104 statistics & probabilityMathematical and Statistical TechniquesResamplingMedicine and Health Scienceslcsh:ScienceStatistical DataUnivariate analysisMultidisciplinarySimulation and ModelingMultivariable calculusRegression analysisHematologyMyeloid LeukemiaPrognosisRegressionBioassays and Physiological AnalysisOncologyResearch DesignPhysical SciencesStatistics (Mathematics)Research ArticleAcute Myeloid LeukemiaPermutationSingle-nucleotide polymorphismComputational biologyBiologyResearch and Analysis MethodsPolymorphism Single Nucleotide03 medical and health sciencesLeukemiasGeneticsHumansStatistical Methods0101 mathematicsDiscrete Mathematicslcsh:RUnivariateCancers and NeoplasmsBiology and Life SciencesModels Theoretical030104 developmental biologyCombinatoricsCase-Control StudiesMultivariate Analysislcsh:QMathematicsdescription
Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses.
year | journal | country | edition | language |
---|---|---|---|---|
2016-05-01 | PLOS ONE |