6533b7dbfe1ef96bd1270284

RESEARCH PRODUCT

Bayesian model to detect phenotype-specific genes for copy number data

Carlos AbellanJ.j. AbellánJuan R. González

subject

MaleGenotypeGene DosageHapMap ProjectBiologylcsh:Computer applications to medicine. Medical informaticsPopulation stratificationBayesian inferencePolymorphism Single NucleotideBiochemistry03 medical and health sciencesBayes' theorem0302 clinical medicineStructural BiologymedicineHumansComputer SimulationGenetic Predisposition to DiseaseGenetic TestingCopy-number variationInternational HapMap Projectlcsh:QH301-705.5Molecular Biology030304 developmental biologyGenetic testingGenetics0303 health sciencesModels StatisticalModels Geneticmedicine.diagnostic_testMethodology ArticleApplied MathematicsConfoundingBayes Theorem3. Good healthComputer Science ApplicationsPhenotypelcsh:Biology (General)030220 oncology & carcinogenesislcsh:R858-859.7FemaleDNA microarrayAlgorithms

description

Abstract Background An important question in genetic studies is to determine those genetic variants, in particular CNVs, that are specific to different groups of individuals. This could help in elucidating differences in disease predisposition and response to pharmaceutical treatments. We propose a Bayesian model designed to analyze thousands of copy number variants (CNVs) where only few of them are expected to be associated with a specific phenotype. Results The model is illustrated by analyzing three major human groups belonging to HapMap data. We also show how the model can be used to determine specific CNVs related to response to treatment in patients diagnosed with ovarian cancer. The model is also extended to address the problem of how to adjust for confounding covariates (e.g., population stratification). Through a simulation study, we show that the proposed model outperforms other approaches that are typically used to analyze this data when analyzing common copy-number polymorphisms (CNPs) or complex CNVs. We have developed an R package, called bayesGen, that implements the model and estimating algorithms. Conclusions Our proposed model is useful to discover specific genetic variants when different subgroups of individuals are analyzed. The model can address studies with or without control group. By integrating all data in a unique model we can obtain a list of genes that are associated with a given phenotype as well as a different list of genes that are shared among the different subtypes of cases.

https://doi.org/10.1186/1471-2105-13-130