Search results for "Suffix array"

showing 6 items of 16 documents

Suffix Array Construction on Multi-GPU Systems

2019

Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …

Multi-core processorSpeedupComputer scienceSuffix array0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural scienceslaw.inventionCUDAShared memory010201 computation theory & mathematicslaw0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSuffixData compressionProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
researchProduct

Suffixes, Conjugates and Lyndon Words

2013

In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…

MultisetReduction (recursion theory)BWT; Lyndon factorization; Suffix ArrayString (computer science)Suffix arrayLyndon words Lyndon factorization BWT Suffix array EBWT Circular words ConjugacyLexicographical orderlaw.inventionSuffix ArrayCombinatoricsBWTLyndon factorizationlawOrder (group theory)Symbol (formal)Word (group theory)Mathematics
researchProduct

r-Indexing the eBWT

2021

The extended Burrows Wheeler Transform (\(\mathrm {eBWT}\)) was introduced by Mantaci et al. [TCS 2007] to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. In our prior work [SPIRE 2021], we give a linear-time algorithm for the \(\mathrm {eBWT}\) that preserves the fundamental property of the original definition (i.e., the independence from the input order). The algorithm combines a modification of the Suffix Array Induced Sorting (SAIS) algorithm [IEEE Trans Comput 2011] with Prefix Free Parsing [AMB 2019; JCB 2020]. In this paper, we show how this construction algorithm leads to r-indexing the \(\mathrm {eBWT}\), i.e., run-length encoded \(\mathrm {eBWT}\) and \(…

Physicsstring compressionBurrows–Wheeler transformSettore INF/01 - InformaticaSearch engine indexingSuffix arrayOrder (ring theory)Burrows-Wheeler-Transform r-index string compression extended BWT compressed indexingBurrows-Wheeler-Transformlaw.inventionCombinatoricsr-indexcompressed indexinglawIndexingextended BWT
researchProduct

Suffix array and Lyndon factorization of a text

2014

Abstract The main goal of this paper is to highlight the relationship between the suffix array of a text and its Lyndon factorization. It is proved in [15] that one can obtain the Lyndon factorization of a text from its suffix array. Conversely, here we show a new method for constructing the suffix array of a text that takes advantage of its Lyndon factorization. The surprising consequence of our results is that, in order to construct the suffix array, the local suffixes inside each Lyndon factor can be separately processed, allowing different implementative scenarios, such as online, external and internal memory, or parallel implementations. Based on our results, the algorithm that we prop…

Sorting suffixes; BWT; Suffix array; Lyndon word; Lyndon factorizationCompressed suffix arraySettore INF/01 - InformaticaSorting suffixesGeneralized suffix treeSuffix arrayOrder (ring theory)Construct (python library)Lyndon wordSorting suffixeTheoretical Computer Sciencelaw.inventionBWTLyndon factorizationComputational Theory and MathematicsFactorizationlawSuffix arrayFactor (programming language)Internal memoryDiscrete Mathematics and CombinatoricsArithmeticcomputerMathematicscomputer.programming_languageJournal of Discrete Algorithms
researchProduct

On-line Construction of Two-Dimensional Suffix Trees

1999

AbstractWe say that a data structure is builton-lineif, at any instant, we have the data structure corresponding to the input we have seen up to that instant. For instance, consider the suffix tree of a stringx[1,n]. An algorithm building iton-lineis such that, when we have read the firstisymbols ofx[1,n], we have the suffix tree forx[1,i]. We present a new technique, which we refer to asimplicit updates, based on which we obtain: (a) an algorithm for theon-lineconstruction of the Lsuffix tree of ann×nmatrixA—this data structure is the two-dimensional analog of the suffix tree of a string; (b) simple algorithms implementing primitive operations forLZ1-typeon-line losslessimage compression m…

Statistics and ProbabilityCompressed suffix arrayNumerical AnalysisControl and OptimizationAlgebra and Number TheoryTheoretical computer scienceApplied MathematicsGeneral MathematicsSuffix treeString (computer science)Generalized suffix treelaw.inventionLongest common substring problemTree (data structure)lawSuffixAlgorithmFM-indexMathematicsJournal of Complexity
researchProduct

Acceleration of short and long DNA read mapping without loss of accuracy using suffix array

2014

HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20 for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies.

Statistics and ProbabilityComputer scienceSequence analysisSequence alignmentdatabase searchescomputer.software_genreBiochemistrylaw.inventionAccelerationchemistry.chemical_compoundlawCIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIALAnimalsHumansMolecular BiologyDatabasesequencing dataSuffix arraySequence analysisHigh-Throughput Nucleotide SequencingalignmentSequence Analysis DNAApplications NotesComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicschemistryDrosophilaSuffixSequence AlignmentcomputerAlgorithmAlgorithmsSoftwareDNA
researchProduct