Search results for "BWT"

showing 10 items of 17 documents

Detecting mutations by eBWT

2018

In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…

0301 basic medicineFOS: Computer and information sciences000 Computer science knowledge general worksBWT LCP Array SNPs Reference-free Assembly-freeLCP ArraySettore INF/01 - Informatica[SDV]Life Sciences [q-bio]Reference-freeAssembly-freeSNP03 medical and health sciences030104 developmental biologyBWTBWT; LCP Array; SNPs; Reference-free; Assembly-freeComputer ScienceComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]SoftwareSNPs
researchProduct

Measuring the clustering effect of BWT via RLE

2017

Abstract The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5] , [16] , [21] , [23] ). In this paper we are concerned about t…

0301 basic medicineGeneral Computer SciencePermutationComputer Science (all)Binary number0102 computer and information sciencesQuantitative Biology::Genomics01 natural sciencesUpper and lower boundsTheoretical Computer ScienceCombinatorics03 medical and health sciencesPermutation030104 developmental biologyTransformation (function)BWT010201 computation theory & mathematicsRun-length encodingComputer Science::Data Structures and AlgorithmsCluster analysisPrimitive root modulo nBWT; Permutation; Run-length encoding; Theoretical Computer Science; Computer Science (all)Word (computer architecture)Run-length encodingMathematics
researchProduct

Computing the Original eBWT Faster, Simpler, and with Less Memory

2021

Mantaci et al. [TCS 2007] defined the \(\mathrm {eBWT}\) to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. However, since this introduction, it has been used more generally to describe any \(\mathrm {BWT}\) of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original \(\mathrm {eBWT}\), which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the \(\mathrm {BWT}\) of a single string that uses …

2019-20 coronavirus outbreakSpeedupString collectionsBig BWTSettore INF/01 - InformaticaSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2)String (computer science)Suffix arrayOrder (ring theory)omega-orderQuantitative Biology::GenomicsBurrows-Wheeler-TransformBurrows-Wheeler-Transform String collections SAIS Big BWT prefix-free parsing extended BWTlaw.inventionCombinatoricsprefix-free parsingSimple (abstract algebra)lawSAISSAIS algorithmIndependence (probability theory)extended BWTMathematics
researchProduct

Burrows-Wheeler Transform on Purely Morphic Words

2022

The study of the compressibility of repetitive sequences is an issue that is attracting great interest. We consider purely morphic words, which are highly repetitive sequences generated by iterating a morphism φ that admits a fixed point (denoted by φ^∞(a) ) starting from a given character a belonging to the finite alphabet A , i.e. φ^∞(a)=lim_{i→∞}φ^i(a) . Such morphisms are called prolongable on a . Here we focus on the compressibility via the Burrows-Wheeler Transform (BWT) of infinite families of finite sequences generated by morphisms. In particular, denoted by r(w) the number of equal-letter runs of a word w , we provide new upper bounds on r(bwt(φ^i(a))) , i.e. the number of equal-le…

BWT Morphism Equal-letter runsSettore INF/01 - Informatica
researchProduct

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform

2020

Abstract Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory con…

Burrows–Wheeler transformComputer science[SDV]Life Sciences [q-bio]Value (computer science)SNPAssembly-free0102 computer and information scienceslcsh:Computer applications to medicine. Medical informatics01 natural sciencesBiochemistryPolymorphism Single Nucleotide03 medical and health sciencesBWTChromosome (genetic algorithm)Structural BiologyHumansSensitivity (control systems)Molecular Biologylcsh:QH301-705.5Alignment-free; Assembly-free; BWT; INDEL; SNP030304 developmental biologyAlignment-free; Assembly-free; BWT; INDEL; SNP;De Bruijn sequence0303 health sciencesSettore INF/01 - InformaticaAlignment-freeApplied MathematicsResearchGenomicsSequence Analysis DNAINDELData structureGraphComputer Science ApplicationsVariable (computer science)lcsh:Biology (General)010201 computation theory & mathematicsAdjacency listlcsh:R858-859.7Suffix[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]AlgorithmAlgorithmsBMC Bioinformatics
researchProduct

Balanced Words Having Simple Burrows-Wheeler Transform

2009

The investigation of the "clustering effect" of the Burrows-Wheeler transform (BWT) leads to study the words having simple BWT , i.e. words w over an ordered alphabet $A=\{a_1,a_2,\ldots,a_k\}$, with $a_1 < a_2 < \ldots <a_k$, such that $bwt(w)$ is of the form $a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_1^{n_1}$, for some non-negative integers $n_1, n_2, \ldots, n_k$. We remark that, in the case of binary alphabets, there is an equivalence between words having simple BWT, the family of (circular) balanced words and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence between these notions. As a main result of this paper we prove that, u…

CombinatoricsConjugacy classClustering effectBurrows–Wheeler transformSettore INF/01 - InformaticaBurrows Wheeler Transform Combinatorics on Words Balanced sequences epistandard rich words words having simple BWTBinary numberBurrows-Wheeler TransformAlphabetBinary alphabetBurrows-Wheeler Transform; Clustering effectMathematics
researchProduct

Sorting suffixes of a text via its Lyndon Factorization

2013

The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can b…

FOS: Computer and information sciencesBWTLyndon FactorizationSettore INF/01 - InformaticaSorting Suffixes; Lyndon Factorization; Lyndon WordsSuffix arrayComputer Science - Data Structures and AlgorithmsData_FILESData Structures and Algorithms (cs.DS)Lyndon wordSorting suffixeSorting SuffixesLyndon Words
researchProduct

Suffixes, Conjugates and Lyndon Words

2013

In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…

MultisetReduction (recursion theory)BWT; Lyndon factorization; Suffix ArrayString (computer science)Suffix arrayLyndon words Lyndon factorization BWT Suffix array EBWT Circular words ConjugacyLexicographical orderlaw.inventionSuffix ArrayCombinatoricsBWTLyndon factorizationlawOrder (group theory)Symbol (formal)Word (group theory)Mathematics
researchProduct

r-Indexing the eBWT

2021

The extended Burrows Wheeler Transform (\(\mathrm {eBWT}\)) was introduced by Mantaci et al. [TCS 2007] to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. In our prior work [SPIRE 2021], we give a linear-time algorithm for the \(\mathrm {eBWT}\) that preserves the fundamental property of the original definition (i.e., the independence from the input order). The algorithm combines a modification of the Suffix Array Induced Sorting (SAIS) algorithm [IEEE Trans Comput 2011] with Prefix Free Parsing [AMB 2019; JCB 2020]. In this paper, we show how this construction algorithm leads to r-indexing the \(\mathrm {eBWT}\), i.e., run-length encoded \(\mathrm {eBWT}\) and \(…

Physicsstring compressionBurrows–Wheeler transformSettore INF/01 - InformaticaSearch engine indexingSuffix arrayOrder (ring theory)Burrows-Wheeler-Transform r-index string compression extended BWT compressed indexingBurrows-Wheeler-Transformlaw.inventionCombinatoricsr-indexcompressed indexinglawIndexingextended BWT
researchProduct

Lightweight BWT Construction for Very Large String Collections

2011

A modern DNA sequencing machine can generate a billion or more sequence fragments in a matter of days. The many uses of the BWT in compression and indexing are well known, but the computational demands of creating the BWT of datasets this large have prevented its applications from being widely explored in this context. We address this obstacle by presenting two algorithms capable of computing the BWT of very large string collections. The algorithms are lightweight in that the first needs O(m log m) bits of memory to process m strings and the memory requirements of the second are constant with respect to m. We evaluate our algorithms on collections of up to 1 billion strings and compare thei…

SequenceTheoretical computer scienceConstant (computer programming)BWTtext indexesComputer scienceString (computer science)Search engine indexingProcess (computing)Context (language use)next-generation sequencingAlphabetBWT; text indexes; next-generation sequencing
researchProduct