0000000000199399
AUTHOR
Steffen Albrecht
ChIP-Seq from Limited Starting Material of K562 Cells and Drosophila Neuroblasts Using Tagmentation Assisted Fragmentation Approach
Chromatin immunoprecipitation is extensively used to investigate the epigenetic profile and transcription factor binding sites in the genome. However, when the starting material is limited, the conventional ChIP-Seq approach cannot be implemented. This protocol describes a method that can be used to generate the chromatin profiles from as low as 100 human or 1,000 Drosophila cells. The method employs tagmentation to fragment the chromatin with concomitant addition of sequencing adaptors. The method generates datasets with high signal to noise ratio and can be subjected to standard tools for ChIP-Seq analysis.
Computational identification of cell-specific variable regions in ChIP-seq data.
ABSTRACT Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Several sources of variation can affect the reproducibility of a particular ChIP-seq assay, which can lead to a misinterpretation of where the protein under investigation binds to the genome in a particular cell type. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate these regions,…
Interpretable machine learning models for single-cell ChIP-seq imputation
AbstractMotivationSingle-cell ChIP-seq (scChIP-seq) analysis is challenging due to data sparsity. High degree of data sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from ENCODE to impute missing protein-DNA interacting regions of target histone marks or transcription factors.ResultsImputations using machine learning models trained for each single cell, each target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene i…
Single-cell ChIP-seq imputation with SIMPA by leveraging bulk ENCODE data
Abstract Single-cell ChIP-seq analysis is challenging due to data sparsity. We present SIMPA ( https://github.com/salbrec/SIMPA ), a single-cell ChIP-seq data imputation method leveraging predictive information within bulk ENCODE data to impute missing protein-DNA interacting regions of target histone marks or transcription factors. Machine learning models trained for each single cell, each target, and each genomic region enable drastic improvement in cell types clustering and genes identification.
Automated quality control of next generation sequencing data using machine learning
AbstractControlling quality of next generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterized common NGS quality features and developed a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal data and external disease diagnostic datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at the following …
TAF-ChIP: An ultra-low input approach for genome wide chromatin immunoprecipitation assay
Chromatin immunoprecipitation (ChIP) followed by next generation sequencing is an invaluable and powerful technique to understand transcriptional regulation. However, ChIP is currently limited by the requirement of large amount of starting material. This renders studying rare cell populations very challenging, or even impossible. Here, we present a tagmentation-assisted fragmentation ChIP (TAF-ChIP) and sequencing method to generate high-quality datasets from low cell numbers. The method relies on Tn5 transposon activity to fragment the chromatin that is immunoprecipitated, thus circumventing the need for sonication or MNAse digestion to fragment. Furthermore, Tn5 adds the sequencing adapto…
TAF-ChIP: an ultra-low input approach for genome-wide chromatin immunoprecipitation assay
The authors present a novel method for obtaining chromatin profiles from low cell numbers without prior nuclei isolation. The method is successfully implemented in generating epigenetic profile from 100 cells with high signal-to-noise ratio.
m6A RNA methylation regulates promoter proximal pausing of RNA Polymerase II
AbstractRNA Polymerase II (RNAP II) pausing is essential to precisely control gene expression and is critical for development of metazoans. Here, we show that the m6A RNA modification regulates promoter-proximal RNAP II pausing. The m6A methyltransferase complex (MTC), with the nuclear reader Ythdc1, are recruited to gene promoters. Depleting the m6A MTC leads to a decrease in RNAP II pause release and in Ser2P occupancy on the gene body, and affects nascent RNA transcription. Tethering Mettl3 to a heterologous gene promoter is sufficient to increase RNAP II pause release, an effect that relies on its m6A catalytic domain. Collectively, our data reveal an important link between RNAP II paus…
Quality control guidelines and machine learning predictions for next generation sequencing data
Abstract Controlling the quality of next generation sequencing (NGS) data files is usually not fully automatized because of its complexity and involves strong assumptions and arbitrary choices. We have statistically characterized common NGS quality features of a large set of files and optimized the complex quality control procedure using a machine learning approach including tree-based algorithms and deep learning. Predictive models were validated using internal and external data, including applications to disease diagnosis datasets. Models are unbiased, accurate and to some extent generalizable to unseen data types and species. Given enough labelled data for training, this approach could p…
Allelic loss but absence of mutations in the polyspecific transporter geneBWR1Aon 11p15.5 in hepatoblastoma
Chromosomal region 11p15.5 shows frequent maternal allelic loss in embryonal tumors, including rhabdomyosarcoma (RMS), Wilms' tumor (WT) and hepatoblastoma (HB), consistent with the presence of at least one tumor suppressor gene in this region, which should be paternally imprinted, i.e., expressed from the maternal allele only. The BWR1A gene encodes a polyspecific transmembrane transporter and is located on 11p15.5. It is highly expressed in liver, paternally imprinted and was found to be mutated in an RMS cell line, making it a plausible tumor suppressor gene for HB. We therefore screened 62 HBs, 3 HB cell lines and 1 pediatric hepatocellular carcinoma for BWR1A mutations using single-str…
Quality-preserving low-cost probabilistic 3D denoising with applications to Computed Tomography
AbstractWe propose a pipeline for a synthetic generation of personalized Computer Tomography (CT) images, with a radiation exposure evaluation and a lifetime attributable risk (LAR) assessment. We perform a patient-specific performance evaluation for a broad range of denoising algorithms (including the most popular Deep Learning denoising approaches, wavelets-based methods, methods based on Mumford-Shah denoising etc.), focusing both on accessing the capability to reduce the patient-specific CT-induced LAR and on computational cost scalability. We introduce a parallel probabilistic Mumford-Shah denoising model (PMS), showing that it markedly-outperforms the compared common denoising methods…