6533b853fe1ef96bd12ad4e2
RESEARCH PRODUCT
Languages with mismatches and an application to approximate indexing
Alessandra GabrieleChiara EpifanioFilippo Mignosisubject
FactorialCombinatorics on wordsString (computer science)Function (mathematics)formal languagesmatching indexingCombinatoricsCombinatorics on wordsIntegerapproximate stringPosition (vector)TrieAlgorithmWord (group theory)Mathematicsdescription
In this paper we describe a factorial language, denoted by L(S, k,r), that contains all words that occur in a string 5 up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h ? 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S, k, r) is a non-increasing function of r and a non-decreasing function of k and that the equation r = R(S, k, r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word x ?L(S, k, r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x| + |occ(x)|.
year | journal | country | edition | language |
---|---|---|---|---|
2005-01-01 |