0000000000236402

AUTHOR

Jean-fred Fontaine

showing 17 related works from this author

Computational identification of cell-specific variable regions in ChIP-seq data.

2019

ABSTRACT Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Several sources of variation can affect the reproducibility of a particular ChIP-seq assay, which can lead to a misinterpretation of where the protein under investigation binds to the genome in a particular cell type. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate these regions,…

Cell typeAcademicSubjects/SCI00010Computational biologyPlasma protein bindingBiologyGenomeCell LineEvolution Molecular03 medical and health scienceschemistry.chemical_compoundMice0302 clinical medicineNarese/3Cell Line TumorGeneticsAnimalsHumansEpigeneticsBinding sitePromoter Regions GeneticTranscription factorEmbryonic Stem Cells030304 developmental biology0303 health sciencesPrincipal Component AnalysisBinding SitesNucleotidesGenetic VariationPromoterGenomicsChromatinchemistryCpG siteMCF-7 CellsChromatin Immunoprecipitation SequencingMethods OnlineR-Loop StructuresK562 CellsChromatin immunoprecipitation030217 neurology & neurosurgeryFunction (biology)DNATranscription FactorsNucleic acids research
researchProduct

Interpretable machine learning models for single-cell ChIP-seq imputation

2019

AbstractMotivationSingle-cell ChIP-seq (scChIP-seq) analysis is challenging due to data sparsity. High degree of data sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from ENCODE to impute missing protein-DNA interacting regions of target histone marks or transcription factors.ResultsImputations using machine learning models trained for each single cell, each target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene i…

Computer sciencebusiness.industryCell chipPython (programming language)Machine learningcomputer.software_genreENCODEIdentification (information)Simulated dataFeature (machine learning)Imputation (statistics)Artificial intelligenceCluster analysisbusinesscomputercomputer.programming_language
researchProduct

Automated quality control of next generation sequencing data using machine learning

2019

AbstractControlling quality of next generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterized common NGS quality features and developed a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal data and external disease diagnostic datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at the following …

business.industryComputer sciencemedia_common.quotation_subjectDeep learningMachine learningcomputer.software_genreDNA sequencingStatistical classificationTree (data structure)Task (computing)SoftwareResource (project management)Data fileQuality (business)Artificial intelligencebusinesscomputermedia_common
researchProduct

TAF-ChIP: An ultra-low input approach for genome wide chromatin immunoprecipitation assay

2018

Chromatin immunoprecipitation (ChIP) followed by next generation sequencing is an invaluable and powerful technique to understand transcriptional regulation. However, ChIP is currently limited by the requirement of large amount of starting material. This renders studying rare cell populations very challenging, or even impossible. Here, we present a tagmentation-assisted fragmentation ChIP (TAF-ChIP) and sequencing method to generate high-quality datasets from low cell numbers. The method relies on Tn5 transposon activity to fragment the chromatin that is immunoprecipitated, thus circumventing the need for sonication or MNAse digestion to fragment. Furthermore, Tn5 adds the sequencing adapto…

Transposable elementCell typebiologyComputer scienceImmunoprecipitationCellGenomicsComputational biologyENCODEGenomeDNA sequencingChromatinmedicine.anatomical_structureTranscriptional regulationbiology.proteinmedicineH3K4me3EpigeneticsChromatin immunoprecipitationMicrococcal nuclease
researchProduct

TAF-ChIP: an ultra-low input approach for genome-wide chromatin immunoprecipitation assay

2019

The authors present a novel method for obtaining chromatin profiles from low cell numbers without prior nuclei isolation. The method is successfully implemented in generating epigenetic profile from 100 cells with high signal-to-noise ratio.

Health Toxicology and MutagenesisPlant ScienceComputational biologySignal-To-Noise RatioBiochemistry Genetics and Molecular Biology (miscellaneous)GenomeDNA sequencingEpigenesis GeneticHistones03 medical and health sciences0302 clinical medicineTranscriptional regulationMethodsAnimalsHumansEpigenetics030304 developmental biologyWhole genome sequencing0303 health sciencesEcologybiologyWhole Genome SequencingChemistryHigh-Throughput Nucleotide SequencingChip11Histonebiology.proteinChromatin Immunoprecipitation SequencingDrosophilaK562 CellsChromatin immunoprecipitation030217 neurology & neurosurgerySoftwareLife Science Alliance
researchProduct

Gene Set to Diseases (GS2D): disease enrichment analysis on human gene sets with literature data

2016

Large sets of candidate genes derived from high-throughput biological experiments can be characterized by functional enrichment analysis. The analysis consists of comparing the functions of one gene set against that of a background gene set. Then, functions related to a significant number of genes in the gene set are expected to be relevant. Web tools offering disease enrichment analysis on gene sets are often based on gene-disease associations from manually curated or experimental data that is accurate but does not cover all diseases discussed in the literature. Using associations automatically derived from literature data could be a cost effective method to improve the coverage of disease…

Candidate genebusiness.industryBig dataExperimental dataGenomicsBiologycomputer.software_genreSet (abstract data type)WorkflowData miningToxicogenomicsbusinesscomputerGeneGenomics and Computational Biology
researchProduct

Defining Human Tyrosine Kinase Phosphorylation Networks Using Yeast as an In Vivo Model Substrate.

2017

Systematic assessment of tyrosine kinase-substrate relationships is fundamental to a better understanding of cellular signaling and its profound alterations in human diseases such as cancer. In human cells, such assessments are confounded by complex signaling networks, feedback loops, conditional activity, and intra-kinase redundancy. Here we address this challenge by exploiting the yeast proteome as an in vivo model substrate. We individually expressed 16 human non-receptor tyrosine kinases (NRTKs) in Saccharomyces cerevisiae and identified 3,279 kinase-substrate relationships involving 1,351 yeast phosphotyrosine (pY) sites. Based on the yeast data without prior information, we generated …

0301 basic medicineCell signalingHistologySaccharomyces cerevisiae ProteinsSaccharomyces cerevisiaeAmino Acid MotifsSaccharomyces cerevisiaeInteractomeReceptor tyrosine kinaseArticlePathology and Forensic Medicine03 medical and health scienceschemistry.chemical_compoundHumansProtein Interaction MapsPhosphorylationbiologyTyrosine phosphorylationCell BiologyProtein-Tyrosine Kinasesbiology.organism_classificationYeastCell biology030104 developmental biologychemistrybiology.proteinPhosphorylationTyrosine kinaseSequence AlignmentCell systems
researchProduct

RNA Sequencing of Human Peripheral Blood Cells Indicates Upregulation of Immune-Related Genes in Huntington's Disease

2020

Huntington's disease (HD) is an autosomal dominantly inherited neurodegenerative disorder caused by a trinucleotide repeat expansion in the Huntingtin gene. As disease-modifying therapies for HD are being developed, peripheral blood cells may be used to indicate disease progression and to monitor treatment response. In order to investigate whether gene expression changes can be found in the blood of individuals with HD that distinguish them from healthy controls, we performed transcriptome analysis by next-generation sequencing (RNA-seq). We detected a gene expression signature consistent with dysregulation of immune-related functions and inflammatory response in peripheral blood from HD ca…

inflammationHuntington's diseaseRNA-Seqdifferential gene expressiondisease markerslcsh:Neurology. Diseases of the nervous systemlcsh:RC346-429Frontiers in Neurology
researchProduct

Disease–Genes Must Guide Data Source Integration in the Gene Prioritization Process

2019

One of the main issues in detecting the genes involved in the etiology of genetic human diseases is the integration of different types of available functional relationships between genes. Numerous approaches exploited the complementary evidence coded in heterogeneous sources of data to prioritize disease-genes, such as functional profiles or expression quantitative trait loci, but none of them to our knowledge posed the scarcity of known disease-genes as a feature of their integration methodology. Nevertheless, in contexts where data are unbalanced, that is, where one class is largely under-represented, imbalance-unaware approaches may suffer a strong decrease in performance. We claim that …

0301 basic medicineClass (computer programming)Boosting (machine learning)Computer scienceProcess (engineering)media_common.quotation_subjectComputational biologyScarcity03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyExpression quantitative trait lociKey (cryptography)Feature (machine learning)Gene prioritizationmedia_common
researchProduct

Posttranslational modifications by ADAM10 shape myeloid antigen-presenting cell homeostasis in the splenic marginal zone

2021

The spleen contains phenotypically and functionally distinct conventional dendritic cell (cDC) subpopulations, termed cDC1 and cDC2, which each can be divided into several smaller and less well-characterized subsets. Despite advances in understanding the complexity of cDC ontogeny by transcriptional programming, the significance of posttranslational modifications in controlling tissue-specific cDC subset immunobiology remains elusive. Here, we identified the cell-surface–expressed A-disintegrin-and-metalloproteinase 10 (ADAM10) as an essential regulator of cDC1 and cDC2 homeostasis in the splenic marginal zone (MZ). Mice with a CD11c-specific deletion of ADAM10 (ADAM10(ΔCD11c)) exhibited a …

MaleLangerinLymphoid TissueNotch signaling pathwayAntigen-Presenting CellsCD11cSpleenADAM10 ProteinMicePhosphatidylinositol 3-KinasesmedicineAnimalsHomeostasisMyeloid CellsProtein kinase BPI3K/AKT/mTOR pathwayCell ProliferationMultidisciplinarybiologyMacrophagesMembrane ProteinsCell DifferentiationDendritic CellsBiological SciencesCD11c AntigenCell biologyMice Inbred C57BLmedicine.anatomical_structurebiology.proteinFemaleAmyloid Precursor Protein SecretasesSignal transductionProtein Processing Post-TranslationalSpleenConventional Dendritic CellSignal TransductionProceedings of the National Academy of Sciences
researchProduct

Lost Strings in Genomes: What Sense Do They Make?

2017

We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.

0301 basic medicineGeneticsanimal structuresgenetic structuresinformation scienceString (physics)GenomeCombinatoricsSet (abstract data type)03 medical and health sciences030104 developmental biology0302 clinical medicinePhylogeneticscardiovascular systemLow correlation030217 neurology & neurosurgerySelection (genetic algorithm)Mathematics
researchProduct

DiseaseLinc: Disease Enrichment Analysis of Sets of Differentially Expressed LincRNAs

2021

Long intergenic non-coding RNAs (LincRNAs) are long RNAs that do not encode proteins. Functional evidence is lacking for most of them. Their biogenesis is not well-known, but it is thought that many lincRNAs originate from genomic duplication of coding material, resulting in pseudogenes, gene copies that lose their original function and can accumulate mutations. While most pseudogenes eventually stop producing a transcript and become erased by mutations, many of these pseudogene-based lincRNAs keep similarity to the parental gene from which they originated, possibly for functional reasons. For example, they can act as decoys for miRNAs targeting the parental gene. Enrichment analysis of fun…

PseudogeneBreast NeoplasmsKaplan-Meier EstimateComputational biologyDiseaseBiologyweb toolENCODEArticleenrichment analysisdiseasesUser-Computer InterfaceIntergenic regionmicroRNAHumansDiseaselcsh:QH301-705.5GeneInternetGene Expression ProfilinglincRNAsGeneral MedicinePrognosisGene Expression Regulation Neoplasticlcsh:Biology (General)FemaleRNA Long NoncodingFunction (biology)BiogenesisCells
researchProduct

LipiDisease: associate lipids to diseases using literature mining

2021

Abstract Summary Lipids exhibit an essential role in cellular assembly and signaling. Dysregulation of these functions has been linked with many complications including obesity, diabetes, metabolic disorders, cancer and more. Investigating lipid profiles in such conditions can provide insights into cellular functions and possible interventions. Hence the field of lipidomics is expanding in recent years. Even though the role of individual lipids in diseases has been investigated, there is no resource to perform disease enrichment analysis considering the cumulative association of a lipid set. To address this, we have implemented the LipiDisease web server. The tool analyzes millions of recor…

Statistics and ProbabilitySupplementary dataWeb serverAcademicSubjects/SCI01060Computer scienceCellular functionsComputational biologyDiseasecomputer.software_genreApplications NotesBiochemistryField (computer science)Computer Science ApplicationsComputational MathematicsComputational Theory and MathematicsLipidomicsData and Text MiningMolecular BiologycomputerBioinformatics
researchProduct

Evaluation of in vivo and in vitro models of toxicity by comparison of toxicogenomics data with the literature.

2017

Toxicity affecting humans is studied by observing the effects of chemical substances in animal organisms (in vivo) or in animal and human cultivated cell lines (in vitro). Toxicogenomics studies collect gene expression profiles and histopathology assessment data for hundreds of drugs and pollutants in standardized experimental designs using different model systems. These data are an invaluable source for analyzing genome-wide drug response in biological systems. However, a problem remains that is how to evaluate the suitability of heterogeneous in vitro and in vivo systems to model the many different aspects of human toxicity. We propose here that a given model system (cell type or animal o…

0301 basic medicineCandidate geneCell typeDrug Evaluation PreclinicalBiologyBioinformaticsToxicogeneticsGeneral Biochemistry Genetics and Molecular BiologyIn vitroRats03 medical and health sciences030104 developmental biologyIn vivoToxicityHepatocytesAnimalsHumansToxicogenomicsTranscriptomeMolecular BiologyGeneFunction (biology)Cells CulturedMethods (San Diego, Calif.)
researchProduct

Statistical guidelines for quality control of next-generation sequencing techniques.

2021

Condition-specific statistical guidelines and accurate classification trees for quality control of functional genomics NGS files (RNA-seq, ChIP-seq and DNase-seq) have been generated using thousands of reference files from the ENCODE project and made available to the community.

Quality ControlComputer scienceHealth Toxicology and Mutagenesismedia_common.quotation_subjectControl (management)genetic processes26Plant ScienceBiochemistry Genetics and Molecular Biology (miscellaneous)HumansQuality (business)Statistical analysisRelevance (information retrieval)natural sciencesResearch Articlesmedia_commonEcologyScope (project management)Genome HumanComputational BiologyHigh-Throughput Nucleotide Sequencing15Sequence Analysis DNA11Data scienceComputingMethodologies_PATTERNRECOGNITIONTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESSoftwareResearch ArticleLife science alliance
researchProduct

Single-cell ChIP-seq imputation with SIMPA by leveraging bulk ENCODE data

2019

Abstract Single-cell ChIP-seq analysis is challenging due to data sparsity. We present SIMPA ( https://github.com/salbrec/SIMPA ), a single-cell ChIP-seq data imputation method leveraging predictive information within bulk ENCODE data to impute missing protein-DNA interacting regions of target histone marks or transcription factors. Machine learning models trained for each single cell, each target, and each genomic region enable drastic improvement in cell types clustering and genes identification.

researchProduct

Quality control guidelines and machine learning predictions for next generation sequencing data

2019

Abstract Controlling the quality of next generation sequencing (NGS) data files is usually not fully automatized because of its complexity and involves strong assumptions and arbitrary choices. We have statistically characterized common NGS quality features of a large set of files and optimized the complex quality control procedure using a machine learning approach including tree-based algorithms and deep learning. Predictive models were validated using internal and external data, including applications to disease diagnosis datasets. Models are unbiased, accurate and to some extent generalizable to unseen data types and species. Given enough labelled data for training, this approach could p…

researchProduct