Search results for "Suffix"
showing 10 items of 75 documents
Acceleration of short and long DNA read mapping without loss of accuracy using suffix array
2014
HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20 for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies.
Parallel Construction and Query of Index Data Structures for Pattern Matching on Square Matrices
1999
AbstractWe describe fast parallel algorithms for building index data structures that can be used to gather various statistics on square matrices. The main data structure is the Lsuffix tree, which is a generalization of the classical suffix tree for strings. Given ann×ntext matrixA, we build our data structures inO(logn) time withn2processors on a CRCW PRAM, so that we can quickly processAin parallel as follows: (i) report some statistical information aboutA, e.g., find the largest repeated square submatrices that appear at least twice inAor determine, for each position inA, the smallest submatrix that occurs only there; (ii) given, on-line, anm×mpattern matrixPAT, check whether it occurs i…
Variable Length Memory Chains: Characterization of stationary probability measures
2021
Variable Length Memory Chains (VLMC), which are generalizations of finite order Markov chains, turn out to be an essential tool to modelize random sequences in many domains, as well as an interesting object in contemporary probability theory. The question of the existence of stationary probability measures leads us to introduce a key combinatorial structure for words produced by a VLMC: the Longest Internal Suffix. This notion allows us to state a necessary and sufficient condition for a general VLMC to admit a unique invariant probability measure. This condition turns out to get a much simpler form for a subclass of VLMC: the stable VLMC. This natural subclass, unlike the general case, enj…
A loop-free two-close Gray-code algorithm for listing k-ary Dyck words
2006
AbstractP. Chase and F. Ruskey each published a Gray code for length n binary strings with m occurrences of 1, coding m-combinations of n objects, which is two-close—that is, in passing from one binary string to its successor a single 1 exchanges positions with a 0 which is either adjacent to the 1 or separated from it by a single 0. If we impose the restriction that any suffix of a string contains at least k−1 times as many 0's as 1's, we obtain k-suffixes: suffixes of k-ary Dyck words. Combinations are retrieved as special case by setting k=1 and k-ary Dyck words are retrieved as a special case by imposing the additional condition that the entire string has exactly k−1 times as many 0's a…
Boosting Textual Compression in Optimal Linear Time
2005
We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. It displays the following remarkable properties: (a) it can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts; (b) it is very simple and optimal in terms of time; and (c) it admits a decompression algorithm again optimal in time. To the best of our knowledge, this is the first boosting technique displaying these properties.Technically, our boosting technique builds upon three main ingredients: the Burrows--Wheeler Transform, the Suffix Tree d…
Algorithmics for the Life Sciences
2013
The life sciences, in particular molecular biology and medicine, have wit- nessed fundamental progress since the discovery of the “the Double Helix”. A rele- vant part of such an incredible advancement in knowledge has been possible thanks to synergies with the mathematical sciences, on the one hand, and computer science, on the other. Here we review some of the most relevant aspects of this cooperation focusing on contributions given by the design, analysis and engineering of fast al- gorithms for the life sciences.
Multi-Dimensional motivic pattern extraction founded on adaptive redundancy filtering
2005
Abstract We present a computational model for discovering repeated patterns in symbolic representations of monodic music. Patterns are discovered through an incremental adaptive identification along a multi-dimensional parametric space. The difficulties of pattern discovery mainly come from combinatorial redundancies, that our model is able to control efficiently. A specificity relation is defined between pattern descriptions, unifying suffix and inclusion relations and enabling a filtering of redundant descriptions. Combinatorial proliferation caused by successive repetitions of patterns is managed using cyclic patterns. The modelling of these redundancy control mechanisms enables an autom…
$O(n^2 log n)$ Time On-line Construction of Two-Dimensional Suffix Trees
2007
The two-dimensional suffix tree of an n × n square matrix A is a compacted trie that represents all square submatrices of A [11]. For the off-line case, i.e., A is given in advance to the algorithm, it is known how to build it in optimal time, for any type of alphabet size [11], [18]. Motivated by applications in Image Compression [22], Giancarlo and Guaiana [14] considered the on-line version of the two-dimensional suffix tree and presented an O(n2 log2 n)-time algorithm, which we refer to as GG. That algorithm is a nontrivial generalization of Ukkonen’s on-line algorithm for standard suffix trees [23]. The main contribution in this paper is an O(logn) factor improvement in the time comple…
Los procesos de derivación locucional en el continuum discursivo de la literatua medieval de castigos
2016
Las características formales y estilísticas del género sapiencial se antojan fundamentales para el estudio histórico de la fraseología. Por su brevedad y concisión, por su capacidad de adaptarse lingu?ísticamente a tradiciones literarias que nacen y se desarrollan en contextos culturales muy diversos, las colecciones de sentencias medievales que tuvieron su eclosión en el siglo XIII y que llegaron hasta el siglo XV como consecuencia de un continuum discursivo, se caracterizaron por manifestar un uso locucional bastante nutrido. Esta investigación tiene como principal objetivo caracterizar todas las locuciones prepositivas cuyo núcleo se vio sometido a un proceso de derivación léxica, lo cua…
Le suffixe diminutif: un marqueur d'appropriation du signifiant.
2010
International audience