Search results for "Suffix array"

showing 6 items of 16 documents

Suffix array and Lyndon factorization of a text

2014

Abstract The main goal of this paper is to highlight the relationship between the suffix array of a text and its Lyndon factorization. It is proved in [15] that one can obtain the Lyndon factorization of a text from its suffix array. Conversely, here we show a new method for constructing the suffix array of a text that takes advantage of its Lyndon factorization. The surprising consequence of our results is that, in order to construct the suffix array, the local suffixes inside each Lyndon factor can be separately processed, allowing different implementative scenarios, such as online, external and internal memory, or parallel implementations. Based on our results, the algorithm that we prop…

Sorting suffixes; BWT; Suffix array; Lyndon word; Lyndon factorizationCompressed suffix arraySettore INF/01 - InformaticaSorting suffixesGeneralized suffix treeSuffix arrayOrder (ring theory)Construct (python library)Lyndon wordSorting suffixeTheoretical Computer Sciencelaw.inventionBWTLyndon factorizationComputational Theory and MathematicsFactorizationlawSuffix arrayFactor (programming language)Internal memoryDiscrete Mathematics and CombinatoricsArithmeticcomputerMathematicscomputer.programming_languageJournal of Discrete Algorithms

researchProduct

Inducing the Lyndon Array

2019

In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where $n$ is the length of the text and $\sigma$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use $O(n)$ words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not onl…

FOS: Computer and information sciences050101 languages & linguisticsComputer scienceComputationInduced suffix sorting02 engineering and technologySpace (mathematics)law.inventionSuffix sortinglawSuffix arrayComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData_FILESPreprocessorData Structures and Algorithms (cs.DS)0501 psychology and cognitive sciencesComputer Science::Data Structures and AlgorithmsTime complexitySettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - Informatica05 social sciencesLightweight algorithmSuffix arraySigmaComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Induced suffix sorting; Lightweight algorithms; Lyndon array; Suffix arrayWorking spaceLyndon arrayLightweight algorithms020201 artificial intelligence & image processingAlgorithmComputer Science::Formal Languages and Automata Theory

researchProduct

Acceleration of short and long DNA read mapping without loss of accuracy using suffix array

2014

HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20 for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies.

Statistics and ProbabilityComputer scienceSequence analysisSequence alignmentdatabase searchescomputer.software_genreBiochemistrylaw.inventionAccelerationchemistry.chemical_compoundlawCIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIALAnimalsHumansMolecular BiologyDatabasesequencing dataSuffix arraySequence analysisHigh-Throughput Nucleotide SequencingalignmentSequence Analysis DNAApplications NotesComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicschemistryDrosophilaSuffixSequence AlignmentcomputerAlgorithmAlgorithmsSoftwareDNA

researchProduct

On the construction of classes of suffix trees for square matrices: Algorithms and applications

1995

Given an n × n TEXT matrix with entries defined over an ordered alphabet σ, we introduce 4n−1 classes of index data structures for TEXT. Those indices are informally the two-dimensional analog of the suffix tree of a string [15], allowing on-line searches and statistics to be performed on TEXT. We provide one simple algorithm that efficiently builds any chosen index in those classes in O(n2 log n) worst case time using O(n2) space. The algorithm can be modified to require optimal O(n2) expected time for bounded σ.

CombinatoricsCompressed suffix arraylawSuffix treeString (computer science)Generalized suffix treeSuffix arraySuffixAlgorithmFM-indexlaw.inventionMathematicsLongest common substring problem

researchProduct

On-line Construction of Two-Dimensional Suffix Trees

1999

AbstractWe say that a data structure is builton-lineif, at any instant, we have the data structure corresponding to the input we have seen up to that instant. For instance, consider the suffix tree of a stringx[1,n]. An algorithm building iton-lineis such that, when we have read the firstisymbols ofx[1,n], we have the suffix tree forx[1,i]. We present a new technique, which we refer to asimplicit updates, based on which we obtain: (a) an algorithm for theon-lineconstruction of the Lsuffix tree of ann×nmatrixA—this data structure is the two-dimensional analog of the suffix tree of a string; (b) simple algorithms implementing primitive operations forLZ1-typeon-line losslessimage compression m…

Statistics and ProbabilityCompressed suffix arrayNumerical AnalysisControl and OptimizationAlgebra and Number TheoryTheoretical computer scienceApplied MathematicsGeneral MathematicsSuffix treeString (computer science)Generalized suffix treelaw.inventionLongest common substring problemTree (data structure)lawSuffixAlgorithmFM-indexMathematicsJournal of Complexity

researchProduct

Uncommon Suffix Tries

2011

Common assumptions on the source producing the words inserted in a suffix trie with $n$ leaves lead to a $\log n$ height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of $n$ and another one whose saturation level is negligible with respect to $\log n$. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources; they are easily extended to families of sources having the same properties. The first example corresponds to a ''logarithmic infinite comb'' and enjoys a non uniform polynomial mixing. The second one corresponds to a ''factorial infinite comb'' for which mixing is uniform and exponential.

FOS: Computer and information sciencesCompressed suffix arrayPolynomialLogarithmGeneral MathematicsSuffix treevariable length Markov chain[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix treeprobabilistic source0102 computer and information sciences02 engineering and technologysuffix trie01 natural scienceslaw.inventionCombinatoricslawComputer Science - Data Structures and AlgorithmsTrieFOS: Mathematics0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)Mixing (physics)[ INFO.INFO-DS ] Computer Science [cs]/Data Structures and Algorithms [cs.DS]MathematicsDiscrete mathematicsApplied MathematicsProbability (math.PR)020206 networking & telecommunicationssuffix trie.Computer Graphics and Computer-Aided Design[MATH.MATH-PR]Mathematics [math]/Probability [math.PR]010201 computation theory & mathematicsmixing properties60J05 37E05Suffix[ MATH.MATH-PR ] Mathematics [math]/Probability [math.PR]Mathematics - ProbabilitySoftware

researchProduct