6533b7d4fe1ef96bd12629b6

RESEARCH PRODUCT

Computational identification of cell-specific variable regions in ChIP-seq data.

Steffen AlbrechtTommaso AndreaniTommaso AndreaniMiguel A. Andrade-navarroJean-fred Fontaine

subject

Cell typeAcademicSubjects/SCI00010Computational biologyPlasma protein bindingBiologyGenomeCell LineEvolution Molecular03 medical and health scienceschemistry.chemical_compoundMice0302 clinical medicineNarese/3Cell Line TumorGeneticsAnimalsHumansEpigeneticsBinding sitePromoter Regions GeneticTranscription factorEmbryonic Stem Cells030304 developmental biology0303 health sciencesPrincipal Component AnalysisBinding SitesNucleotidesGenetic VariationPromoterGenomicsChromatinchemistryCpG siteMCF-7 CellsChromatin Immunoprecipitation SequencingMethods OnlineR-Loop StructuresK562 CellsChromatin immunoprecipitation030217 neurology & neurosurgeryFunction (biology)DNATranscription Factors

description

ABSTRACT Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Several sources of variation can affect the reproducibility of a particular ChIP-seq assay, which can lead to a misinterpretation of where the protein under investigation binds to the genome in a particular cell type. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate these regions, we developed a reproducibility score and a method that identifies cell-specific variable regions in ChIP-seq data by integrating replicated ChIP-seq experiments for multiple protein targets on a particular cell type. Using our method, we found variable regions in human cell lines K562, GM12878, HepG2, MCF-7, and in mouse embryonic stem cells, defined as protein binding regions with non-reproducible results across replicated experiments. These variable-occupancy target (VOT) regions are CG dinucleotide rich, and show enrichment at promoters and R-loops. They overlap significantly with HOT regions, but are not blacklisted regions producing non-specific binding ChIP-seq peaks. Interestingly, among various genomic features, DNA accessibility is a better predictor of VOTs than CpG islands or epigenetic marks. Our method can be useful to point to such regions along the genome in a given cell type of interest, to improve the downstream interpretative analysis before follow up experiments.

10.1093/nar/gkaa180https://pubmed.ncbi.nlm.nih.gov/32187374