6533b7d4fe1ef96bd12629fe

RESEARCH PRODUCT

Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data

Harald BinderChristian P. MüllerPhilipp S. WildStefan BlankenbergCaroline RöthemeierDavid-alexandre TrégouëtDavid-alexandre TrégouëtDavid-alexandre TrégouëtNorbert PfeifferTanja ZellerRenate B. SchnabelKarl J. LacknerManfred E. BeutelCarole ProustCarole ProustCarole ProustLaurence TiretLaurence TiretLaurence TiretAndreas ZieglerArne Schillert

subject

Male0301 basic medicineMolecular biologyMicroarrayslcsh:MedicineGene ExpressionPolynomialsMonocytesMathematical and Statistical Techniques0302 clinical medicineLongitudinal StudiesProspective Studieslcsh:ScienceOligonucleotide Array Sequence AnalysisGeneticsPrincipal Component Analysis[SDV.MHEP] Life Sciences [q-bio]/Human health and pathologyMultidisciplinaryGenomicsReplicateMiddle AgedRegressionRNA isolationBioassays and Physiological Analysis030220 oncology & carcinogenesisPhysical SciencesPrincipal component analysisFemaleRNA hybridizationDNA microarrayTranscriptome AnalysisStatistics (Mathematics)Research ArticleAdultComputational biologyBiologyBiomolecular isolationGeneralized linear mixed model03 medical and health sciencesDeming regressionExtraction techniquesGeneticsHumansStatistical MethodsAgedQuantile normalizationMolecular probe techniquesGene Expression Profilinglcsh:RBiology and Life SciencesComputational BiologyGenome AnalysisProbe hybridizationRNA extractionResearch and analysis methodsGene expression profilingMolecular biology techniquesAlgebra030104 developmental biologyNonlinear DynamicsMultivariate Analysislcsh:QMathematics[SDV.MHEP]Life Sciences [q-bio]/Human health and pathology

description

International audience; Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data.

10.1371/journal.pone.0156594https://hal.sorbonne-universite.fr/hal-01344173/document