Search results for "algorithm."
showing 10 items of 4617 documents
The colored longest common prefix array computed via sequential scans
2018
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…
Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus
2016
Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal …
Alignment-free sequence comparison using absent words
2018
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…
Use of deep learning methods to translate drug-induced gene expression changes from rat to human primary hepatocytes
2020
In clinical trials, animal and cell line models are often used to evaluate the potential toxic effects of a novel compound or candidate drug before progressing to human trials. However, relating the results of animal and in vitro model exposures to relevant clinical outcomes in the human in vivo system still proves challenging, relying on often putative orthologs. In recent years, multiple studies have demonstrated that the repeated dose rodent bioassay, the current gold standard in the field, lacks sufficient sensitivity and specificity in predicting toxic effects of pharmaceuticals in humans. In this study, we evaluate the potential of deep learning techniques to translate the pattern of …
Measuring the clustering effect of BWT via RLE
2017
Abstract The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5] , [16] , [21] , [23] ). In this paper we are concerned about t…
Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochth…
2018
Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and …
Comparison of CRISPR and Marker-Based Methods for the Engineering of Phage T7
2020
This article belongs to the Section Bacterial Viruses.
Previously Undescribed Family Mutation in the JAG1 Gene as a Cause for Alagille Syndrome
2017
A Novel Role for CSRP1 in a Lebanese Family with Congenital Cardiac Defects
2017
Despite an obvious role for consanguinity in congenital heart disease (CHD), most studies fail to document a monogenic model of inheritance except for few cases. We hereby describe a first-degree cousins consanguineous Lebanese family with 7 conceived children: 2 died in utero of unknown causes, 3 have CHD, and 4 have polydactyly. The aim of the study is to unveil the genetic variant(s) causing these phenotypes using next generation sequencing (NGS) technology. Targeted exome sequencing identified a heterozygous duplication in CSRP1 which leads to a potential frameshift mutation at position 154 of the protein. This mutation is inherited from the father, and segregates only with the CHD phen…
Lost Strings in Genomes: What Sense Do They Make?
2017
We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.