Search results for "pattern matching"
showing 10 items of 44 documents
Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor
2014
Bit-parallel pattern matching encodes calculated values in bit arrays. This approach gains its efficiency by performing multiple updates within a machine word. An important parameter is therefore the machine word size (e.g. 32 or 64 bits). With the increasing length of vector registers, the efficient mapping of bit-parallel pattern matching algorithms onto modern high performance computing architectures is becoming increasingly important. In this paper, we investigate an efficient implementation of the Wu-Manber approximate pattern matching algorithm on the Intel Xeon Phi coprocessor. This architecture features a 512-bit long vector processing unit (VPU) as well as a large number of process…
Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms
2003
The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression…
Generalizations of the periodicity Theorem of Fine and Wilf
2005
We provide three generalizations to the two-dimensional case of the well known periodicity theorem by Fine and Wilf [4] for strings (the one-dimensional case). The first and the second generalizations can be further extended to hold in the more general setting of Cayley graphs of groups. Weak forms of two of our results have been developed for the design of efficient algorithms for two-dimensional pattern matching [2, 3, 6].
Searching for Jumbled Patterns in Strings
2009
On Approximate Jumbled Pattern Matching in Strings
2011
Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t) = q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u, v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u <= q <= v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem …
Learning of regular expressions by pattern matching
1995
We consider the problem of restoring regular expressions from good examples. We describe a natural learning algorithm for obtaining a “plausible” regular expression from one example. The algorithm is based on finding the longest substring which can be matched by some part of the so far obtained expression. We believe that the algorithm to a certain extent mimics humans guessing regular expressions from the same sort of examples. We show that for regular expressions of bounded length successful learning takes time linear in the length of the example, provided that the example is “good”. Under certain natural restrictions the run-time of the learning algorithm is polynomial also in unsuccessf…
A computer system to perform structure comparison using TOPS representations of protein structure
2001
We describe the design and implementation of a fast topology-based method for protein structure comparison. The approach uses the TOPS topological representation of protein structure, aligning two structures using a common discovered pattern and generating measure of distance derived from an insert score. Heavy use is made of a constraint-based pattern-matching algorithm for TOPS diagrams that we have designed and described elsewhere (Bioinformatics 15(4) (1999) 317). The comparison system is maintained at the European Bioinformatics Institute and is available over the Web at tops.ebi.ac.uk/tops. Users submit a structure description in Protein Data Bank (PDB) format and can compare it with …