Search results for "Sequence analysis"
showing 10 items of 1349 documents
ATR expands embryonic stem cell fate potential in response to replication stress
2020
Fondazione Italiana per la Ricerca sul Cancro FIRC 18112 Sina Atashpaz.Fondazione Umberto Veronesi Sina Atashpaz Associazione Italiana per la Ricerca sul Cancro AIRC 5xmille METAMECH program Vincenzo Costanzo Giovanni Armenise-Harvard Foundation Vincenzo Costanzo European Research Council Consolidator grant 614541 Vincenzo Costanzo Associazione Italiana per la Ricerca sul Cancro Fellowship 23961 Negar ArghavanifarDanish Cancer Society KBVU-2014 Andres Joaquin Lopez-Contreras Danish Council for Independent Research Sapere Aude, DFF Starting Grant 2014 Andres Joaquin Lopez-Contreras European Research Council ERC-2015-STG-679068 Andres Joaquin Lopez-Contreras Danish National Research Foundatio…
Genetic and Epigenetic Characteristics of Inflammatory Bowel Disease-Associated Colorectal Cancer.
2021
doi: 10.1053/j.gastro.2021.04.042 Background & Aims Inflammatory bowel disease (IBD) is a chronic, relapsing inflammatory disorder associated with an elevated risk of colorectal cancer (CRC). IBD-associated CRC (IBD-CRC) may represent a distinct pathway of tumorigenesis compared to sporadic CRC (sCRC). Our aim was to comprehensively characterize IBD-associated tumorigenesis integrating multiple high-throughput approaches, and to compare the results with in-house data sets from sCRCs. Methods Whole-genome sequencing, single nucleotide polymorphism arrays, RNA sequencing, genome-wide methylation analysis, and immunohistochemistry were performed using fresh-frozen and formalin-fixed tissue sam…
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms
2018
Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…
Maternal DNA lineages at the gate of Europe in the 10th century AD
2018
Given the paucity of archaeogenetic data available for medieval European populations in comparison to other historical periods, the genetic landscape of this age appears as a puzzle of dispersed, small, known pieces. In particular, Southeastern Europe has been scarcely investigated to date. In this paper, we report the study of mitochondrial DNA in 10th century AD human samples from Capidava necropolis, located in Dobruja (Southeastern Romania, Southeastern Europe). This geographical region is particularly interesting because of the extensive population flux following diverse migration routes, and the complex interactions between distinct population groups during the medieval period. We suc…
Diversification of spatiotemporal expression and copy number variation of the echinoid hbox12/pmar1/micro1 multigene family
2017
Changes occurring during evolution in the cis-regulatory landscapes of individual members of multigene families might impart diversification in their spatiotemporal expression and function. The archetypal member of the echinoid hbox12/pmar1/micro1 family is hbox12-a, a homeobox-containing gene expressed exclusively by dorsal blastomeres, where it governs the dorsal/ventral gene regulatory network during embryogenesis of the sea urchin Paracentrotus lividus. Here we describe the inventory of the hbox12/pmar1/micro1 genes in P. lividus, highlighting that gene copy number variation occurs across individual sea urchins of the same species. We show that the various hbox12/pmar1/micro1 genes grou…
iDamIDseq and iDEAR: an improved method and computational pipeline to profile chromatin-binding proteins
2016
DNA adenine methyltransferase identification (DamID) has emerged as an alternative method to profile protein-DNA interactions; however, critical issues limit its widespread applicability. Here, we present iDamIDseq, a protocol that improves specificity and sensitivity by inverting the steps DpnI-DpnII and adding steps that involve a phosphatase and exonuclease. To determine genome-wide protein-DNA interactions efficiently, we present the analysis tool iDEAR (iDamIDseq Enrichment Analysis with R). The combination of DamID and iDEAR permits the establishment of consistent profiles for transcription factors, even in transient assays, as we exemplify using the small teleost medaka (Oryzias lati…
FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
2017
Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.
2020
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, w…
Feasibility of sample size calculation for RNA-seq studies
2017
Sample size calculation is a crucial step in study design but is not yet fully established for RNA sequencing (RNA-seq) analyses. To evaluate feasibility and provide guidance, we evaluated RNA-seq sample size tools identified from a systematic search. The focus was on whether real pilot data would be needed for reliable results and on identifying tools that would perform well in scenarios with different levels of biological heterogeneity and fold changes (FCs) between conditions. We used simulations based on real data for tool evaluation. In all settings, the six evaluated tools provided widely different answers, which were strongly affected by FC. Although all tools failed for small FCs, s…
MiasDB: A Database of Molecular Interactions Associated with Alternative Splicing of Human Pre-mRNAs.
2016
Alternative splicing (AS) is pervasive in human multi-exon genes and is a major contributor to expansion of the transcriptome and proteome diversity. The accurate recognition of alternative splice sites is regulated by information contained in networks of protein-protein and protein-RNA interactions. However, the mechanisms leading to splice site selection are not fully understood. Although numerous databases have been built to describe AS, molecular interaction databases associated with AS have only recently emerged. In this study, we present a new database, MiasDB, that provides a description of molecular interactions associated with human AS events. This database covers 938 interactions …