0000000000202140

AUTHOR

Roberto Grossi

Multi-dimensional pattern matching with dimensional wildcards

We introduce a new multi-dimensional pattern matching problem, which is a natural generalization of the on-line search in string matching. We are given a text matrix A[1: n1, ..., 1:n d ] of size N= n1×n2×...×n d , which we may preprocess. Then, we are given, online, an r-dimensional pattern matrix B[1:m1,...,1:m r ] of size M= m1×m2×...×m r , with 1≤r≤d. We would like to know whether B*=B*[*, 1:m1,*, ...,1: mr, *] occurs in A, where * is a dimensional wildcard such that B* is any d-dimensional matrix having size 1 × ... × m1×...1×m r ×...1 and containing the same elements as B. Notice that there might be (d/r)≤2d occurrences of B* for each position of A. We give CRCW-PRAM algorithms for pr…

research product

Parallel Construction and Query of Index Data Structures for Pattern Matching on Square Matrices

AbstractWe describe fast parallel algorithms for building index data structures that can be used to gather various statistics on square matrices. The main data structure is the Lsuffix tree, which is a generalization of the classical suffix tree for strings. Given ann×ntext matrixA, we build our data structures inO(logn) time withn2processors on a CRCW PRAM, so that we can quickly processAin parallel as follows: (i) report some statistical information aboutA, e.g., find the largest repeated square submatrices that appear at least twice inAor determine, for each position inA, the smallest submatrix that occurs only there; (ii) given, on-line, anm×mpattern matrixPAT, check whether it occurs i…

research product

A trie-based approach for compacting automata

International audience; We describe a new technique for reducing the number of nodes and symbols in automata based on tries. The technique stems from some results on anti-dictionaries for data compression and does not need to retain the input string, differently from other methods based on compact automata. The net effect is that of obtaining a lighter automaton than the directed acyclic word graph (DAWG) of Blumer et al., as it uses less nodes, still with arcs labeled by single characters.

research product

Multi-Dimensional Pattern Matching with Dimensional Wildcards: Data Structures and Optimal On-Line Search Algorithms

We introduce a new multidimensional pattern matching problem that is a natural generalization of string matching, a well studied problem1. The motivation for its algorithmic study is mainly theoretical. LetA1:n1,?,1:nd be a text matrix withN=n1?ndentries andB1:m1,?,1:mr be a pattern matrix withM=m1?mrentries, whered?r?1 (the matrix entries are taken from an ordered alphabet ?). We study the problem of checking whether somer-dimensional submatrix ofAis equal toB(i.e., adecisionquery).Acan be preprocessed andBis given on-line. We define a new data structure for preprocessingAand propose CRCW-PRAM algorithms that build it inO(logN) time withN2/nmaxprocessors, wherenmax=max(n1,?,nd), such that …

research product

Linear-size suffix tries

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n = | w | , a suffix tree for w takes O ( n ) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ ( n 2 ) nodes and lin…

research product

On the Construction of Classes of Suffix Trees for Square Matrices: Algorithms and Applications

AbstractWe provide a uniform framework for the study of index data structures for a two-dimensional matrixTEXT[1:n, 1:n] whose entries are drawn from an ordered alphabetΣ. An index forTEXTcan be informally seen as the two-dimensional analog of the suffix tree for a string. It allows on-line searches and statistics to be performed onTEXTby representing compactly theΘ(n3) square submatrices ofTEXTin optimalO(n2) space. We identify 4n−1families of indices forTEXT, each containing ∏ni=1(2i−1)! isomorphic data structures. We also develop techniques leading to a single algorithm that efficiently builds any index in any family inO(n2logn) time andO(n2) space. Such an algorithm improves in various …

research product

On the construction of classes of suffix trees for square matrices: Algorithms and applications

Given an n × n TEXT matrix with entries defined over an ordered alphabet σ, we introduce 4n−1 classes of index data structures for TEXT. Those indices are informally the two-dimensional analog of the suffix tree of a string [15], allowing on-line searches and statistics to be performed on TEXT. We provide one simple algorithm that efficiently builds any chosen index in those classes in O(n2 log n) worst case time using O(n2) space. The algorithm can be modified to require optimal O(n2) expected time for bounded σ.

research product