0000000000942420

AUTHOR

Jonas Ibn-salem

Evolutionary stability of topologically associating domains is associated with conserved gene regulation

AbstractBackgroundThe human genome is highly organized in the three-dimensional nucleus. Chromosomes fold locally into topologically associating domains (TADs) defined by increased intra-domain chromatin contacts. TADs contribute to gene regulation by restricting chromatin interactions of regulatory sequences, such as enhancers, with their target genes. Disruption of TADs can result in altered gene expression and is associated to genetic diseases and cancers. However, it is not clear to which extent TAD regions are conserved in evolution and whether disruption of TADs by evolutionary rearrangements can alter gene expression.ResultsHere, we hypothesize that TADs represent essential functiona…

research product

MOESM5 of 7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Additional file 5: Figure S2. 7C model parameters and optimal cut-offs for binary prediction. (A) Parameter values of the logistic regression model in 7C for different features (columns), separated for different models (rows). Average of model parameters of model training in 10-fold cross-validation is shown with error bars indicating the standard deviations. While the first six rows represent the models with the indicated TF ChIP-seq data and the genomic features, “Avg. all TF” is the average across all 124 TFs analyzed and “Avg. best 10 TF” is the average across the best ten performing TF models. (B) Prediction performance as f1 score (y-axis) for different cutoffs on the prediction proba…

research product

Additional file 3: of Evolutionary stability of topologically associating domains is associated with conserved gene regulation

Figure S3. Distance between rearrangement breakpoints and random controls to closest TAD boundary. For each species (y-axis) and fill size threshold (vertical panels) the distances from all identified rearrangement breakpoints to its closest TAD boundary (x-axis) are compared between actual rearrangements (blue) and 100 times randomized background controls (gray). The left panel shows distances to next hESC TAD boundary and the right panel distances to closest GM12878 contact domain boundary. P-values according to Wilcoxonâ s rank-sum test. (PDF 14 kb)

research product

The Developmental Transcriptome for Lytechinus variegatus Exhibits Temporally Punctuated Gene Expression Changes

AbstractEmbryonic development is arguably the most complex process an organism undergoes during its lifetime, and understanding this complexity is best approached with a systems-level perspective. The sea urchin has become a highly valuable model organism for understanding developmental specification, morphogenesis, and evolution. As a non-chordate deuterostome, the sea urchin occupies an important evolutionary niche between protostomes and vertebrates.Lytechinus variegatus(Lv) is an Atlantic species that has been well studied, and which has provided important insights into signal transduction, patterning, and morphogenetic changes during embryonic and larval development. The Pacific specie…

research product

MOESM2 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 2: Figure S2. Distribution of the distances from the TSS of the genes to their closest TAD borders depending on the gene association with disease. The TAD border is represented with a vertical black line. Blue and salmon color represent genes associated and not with disease, respectively. If the TSS is within a TAD a negative distance is calculated, otherwise the distance is positive. a. HK genes. b. non-HK genes. Insets: The densities for the same data is shown. Genes not associated with disease have higher preference for TAD borders but this is only significant for non-HK genes (p-value = 9 × 10−11, Wilcoxon rank test).

research product

Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus

Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal …

research product

MOESM9 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 9: Figure S8. Distribution of TAD lengths depending on the number of TSSs they contain. An horizontal black line indicates the median for each TAD category.

research product

MOESM5 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 5: Figure S4. Fraction of genes for HK and non-HK genes associated with disease (ordinates) depending on the number of genes contained within the TADs (n; abscissas); the numbers have been aggregated for n ≥ 6. The lower the number of genes inside the TAD the higher fraction of the genes associated with disease: a. HK genes; a p-value = 3.6 × 10−5 from a Chi-square test, comparing the number of genes associated and non-associated with disease for the six TAD categories, was obtained. The green dotted line represents the genome-wide fraction of HK genes associated with disease (0.309). b. non-HK genes; a p-value = 1.2 × 10−43 from a Chi-square test has been obtained. The gree…

research product

Additional file 2: of Evolutionary stability of topologically associating domains is associated with conserved gene regulation

Figure S2. Distribution of evolutionary rearrangement breakpoints between human and 12 vertebrate genomes around domains. Relative breakpoint numbers from human and different species (horizontal panels) around hESC TADs (left), GM12878 contact domains (center), and GRBs (left). Blue color scale represents breakpoints from different fill-size thresholds. Dotted lines in gray show simulated background controls of randomly placed breakpoints. (PDF 42 kb)

research product

MOESM8 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 8: Figure S7. Mean ratios of the number of enhancers per gene within the TADs versus the number of genes within the TAD associated with disease (0 ≤ k ≤ n), where n is the total number of genes within the TAD. The value of n, which determines the TAD category, is represented for TADs with n = 1, 2, 3, and 4 genes (red, blue, green and purple lines, respectively). TADs with fewer TSSs have higher ratios of enhancers to TSSs. Moreover, for each TAD category, the higher the number of genes associated with disease, the higher the average number of enhancers per gene.

research product

Highlights of the 1st Student Symposium on Computational Genomics

On 30 November 2016, over 70 junior researchers in computational biology from diverse countries met in Mainz, Germany, for the 1st Student Symposium on Computational Genomics. Overall, the symposium was a great success and featured four outstanding keynote lectures, nine selected student talks, and over 38 poster presentations. This report briefly highlights the scientific outcomes and activities of this student-driven event.

research product

MOESM3 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 3: Figure S3. Number of TADs depending on the number of genes within the TADs. The counts are displayed behind each bar. Many TADs contain few genes and from a total of 9274 TADs, 2017 TADs (21.7%) have no gene within them.

research product

MOESM1 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 1: Figure S1. Distribution of the distances from the TSS of genes to their closest TAD borders. The TAD borders are represented with a vertical black line. Blue and salmon color represent HK and non-HK genes, respectively. If the TSS is within a TAD a negative distance is calculated, otherwise the distance is positive. Each bin represents 500 nt. Inset: the density for the same data is shown. The preference of HKs toward the TAD borders is significant (p-value = 3 × 10−4, Wilcoxon rank test).

research product

MOESM6 of 7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Additional file 6: Figure S3. High resolution Hi-C map with 7C loop predictions. The red color intensity shows Hi-C interaction frequencies at an example locus of chromosome 1. The blue squares indicate 7C loop predictions using a Rad21 ChIP-seq experiment. The figure was created using the Juicebox tool by loading the combined Hi-C data set in GM12878 from [13] with mapping quality MAPQ ≥30 at a resolution of 5 kb.

research product

Additional file 1 of Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus

Supplementary figures and tables. The following additional data are available with the online version of this paper. Additional data file 1 contains an explanatory figure for duplication levels as well as figures and tables for additional analyses including duplication rate plots, examples for mapping artifacts, 5â end coverage around motif centered binding sites, cross-correlation plots, qfrag-length distributions, scatterplots of signal scores of overlapping peaks and corresponding IDR plots, as well as two tables containing the total numbers of overlapping peaks and overlapping peaks with IDR â ¤ 0.01 for all pairs of biological replicates. (PDF 3840 kb)

research product

MOESM7 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 7: Figure S6. Distribution of the ratios of the number of enhancers to genes depending on the number of genes within a TAD. Mean and median values of each boxplot are shown by white diamonds and black horizontal lines, respectively.

research product

Additional file 1: of Evolutionary stability of topologically associating domains is associated with conserved gene regulation

Figure S1. Breakpoint identification accuracy as compared to gene synteny. Considered are adjacent pairs of human genes with one-to-one orthologs and intergenic distance below a size threshold. (A) Positive predicted value as the fraction of non-syntenic gene pairs with breakpoint from all considered gene pairs (syntenic and non-syntenic) with breakpoint. (B) False positive rate as the percent of syntenic gene pairs with breakpoint from the sum of syntenic pairs with breakpoint and non-syntenic gene pairs without breakpoint. (PDF 21 kb)

research product

7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs.

Abstract Background Knowledge of the three-dimensional structure of the genome is necessary to understand how gene expression is regulated. Recent experimental techniques such as Hi-C or ChIA-PET measure long-range chromatin interactions genome-wide but are experimentally elaborate, have limited resolution and such data is only available for a limited number of cell types and tissues. Results While ChIP-seq was not designed to detect chromatin interactions, the formaldehyde treatment in the ChIP-seq protocol cross-links proteins with each other and with DNA. Consequently, also regions that are not directly bound by the targeted TF but interact with the binding site via chromatin looping are…

research product

The distributions of protein coding genes within chromatin domains in relation to human disease.

Abstract Background Our understanding of the nuclear chromatin structure has increased hugely during the last years mainly as a consequence of the advances in chromatin conformation capture methods like Hi-C. The unprecedented resolution of genome-wide interaction maps shows functional consequences that extend the initial thought of an efficient DNA packaging mechanism: gene regulation, DNA repair, chromosomal translocations and evolutionary rearrangements seem to be only the peak of the iceberg. One key concept emerging from this research is the topologically associating domains (TADs) whose functional role in gene regulation and their association with disease is not fully untangled. Resul…

research product

Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Background: Transcription factors (TFs) bind to gene promoters or distal regulatory elements that interact with the promoter via chromatin looping. While the TF binding sites themselves are detected genome-wide by ChIP-seq experiments, it is difficult to associate them regulated genes without information of chromatin looping. Recent experimental techniques such as Hi-C or ChIA-PET measure long-range interactions genome-wide but are experimentally elaborate and have limited resolution. Here, we present Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs (7C). Results: While ChIP-seq was not designed to detect contacts, the formaldehyde treatment in the ChI…

research product

MOESM4 of 7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Additional file 4: Figure S1. Hi-C and ChIA-PET interactions and their overlap with CTCF motif pairs. (A) Number of genome-wide CTCF motifs by motif hit significance cutoff. (B) Number of CTCF motif pairs within 1 Mb distance by motif hit significance. (C) Percent of CTCF motif pairs that overlap with experimentally measured Hi-C and ChIA-PET loops by the motif hit significance. (D) Upset plot of true loop data sets (rows) and their size (horizontal bars) with their intersections (columns, and vertical bars) based on the number of overlapping CTCF motif pairs. (E) Distribution of interaction span (distance between anchors) of Hi-C loops and ChIA-PET loops in GM12878 that are used as gold st…

research product

Computational processing and quality control of Hi-C, capture Hi-C and capture-C data

Hi-C, capture Hi-C (CHC) and Capture-C have contributed greatly to our present understanding of the three-dimensional organization of genomes in the context of transcriptional regulation by characterizing the roles of topological associated domains, enhancer promoter loops and other three-dimensional genomic interactions. The analysis is based on counts of chimeric read pairs that map to interacting regions of the genome. However, the processing and quality control presents a number of unique challenges. We review here the experimental and computational foundations and explain how the characteristics of restriction digests, sonication fragments and read pairs can be exploited to distinguish…

research product

MOESM7 of 7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Additional file 7: Figure S4. (A) Prediction performance (auPRC) of 7C when trained and evaluated on different datasets of experimentally measured loops as gold standard. Rao_GM12878 refers to Hi-C loops from [13], Tang2015_GM12878_CTCF, and Tang2015_GM12878_RNAPII to ChIA-PET loops using CTCF or Polymerase II as the target [16]. In Union, all datasets were taken together, and in Intersection, only those CTCF motif pairs that were measured in all datasets were considered positive. (B) Prediction performance (auPRC) of 7C compared to a logistic regression model that uses only the the total coverage signal within +/− 500 bp around the motif center at both loop anchor sites separately. In both…

research product

MOESM6 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 6: Figure S5. Distribution of the number of enhancers within TADs versus the number of genes contained within the TADs. Mean and median values of each boxplot are shown by white diamonds and black horizontal lines, respectively. The more genes within a TAD, the larger the number of enhancers.

research product

Computational Prediction of Position Effects of Apparently Balanced Human Chromosomal Rearrangements.

Interpretation of variants of uncertain significance, especially chromosomal rearrangements in non-coding regions of the human genome, remains one of the biggest challenges in modern molecular diagnosis. To improve our understanding and interpretation of such variants, we used high-resolution three-dimensional chromosomal structural data and transcriptional regulatory information to predict position effects and their association with pathogenic phenotypes in 17 subjects with apparently balanced chromosomal abnormalities. We found that the rearrangements predict disruption of long-range chromatin interactions between several enhancers and genes whose annotated clinical features are strongly …

research product

Co-regulation of paralog genes in the three-dimensional chromatin architecture.

Paralog genes arise from gene duplication events during evolution, which often lead to similar proteins that cooperate in common pathways and in protein complexes. Consequently, paralogs show correlation in gene expression whereby the mechanisms of co-regulation remain unclear. In eukaryotes, genes are regulated in part by distal enhancer elements through looping interactions with gene promoters. These looping interactions can be measured by genome-wide chromatin conformation capture (Hi-C) experiments, which revealed self-interacting regions called topologically associating domains (TADs). We hypothesize that paralogs share common regulatory mechanisms to enable coordinated expression acco…

research product

MOESM1 of 7C:Â Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Additional file 1: Table S1. Metadata of ChIP-seq experiments from ENCODE in human GM12878 cells with accession ID and download link.

research product

Additional file 4: of Evolutionary stability of topologically associating domains is associated with conserved gene regulation

Table S1. Matching tissues and samples with CAGE expression data in human and mouse. (TSV 2 kb)

research product

MOESM12 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 12: Table S3. Distance of each TSS to the closest TAD border. The distance (negative) has been calculated for each TAD where the TSS is contained. If the TSS is within no TAD the closest distance (positive) to a TAD border has been calculated. Each entry of the table displays the following information by columns: geneId, gene strand, gene locus, TSS of gene, distance to the TAD border, and TAD.

research product

Additional file 5: of Evolutionary stability of topologically associating domains is associated with conserved gene regulation

Table S2. Ortholog genes in human and mouse with gene expression correlation across tissues. (TSV 1036 kb)

research product

MOESM3 of 7C:Â Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Additional file 3: Table S3. Accession numbers and download URLs for data sets used in data type comparisons.

research product

MOESM11 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 11: Table S2. The 3650 different protein coding HKs.

research product

MOESM4 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 4: Table S4. TADs that contain only one gene.

research product

MOESM10 of The distributions of protein coding genes within chromatin domains in relation to human disease

Additional file 10: Table S1. The 18,141 different protein coding genes. Each row has the following information in the columns: geneid, gene locus, transcription starting site (TSS), and CTD gene association or not with disease.

research product

MOESM2 of 7C:Â Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

Additional file 2: Table S2. Metadata of ChIP-seq experiments from ENCODE human HeLa cells with accession ID and download link.

research product