6533b858fe1ef96bd12b656c
RESEARCH PRODUCT
On the longest common factor problem
Filippo MignosiAlessandra GabrieleMaxime CrochemoreMaxime CrochemoreMauriana Pesaresisubject
Discrete mathematicsSettore INF/01 - InformaticaSuffix tree[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix treeDAWGsuffix tree[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Data structureLongest common substring problemlaw.inventionCombinatoricsSet (abstract data type)Range (mathematics)lawLongest Common Factor ProblemSuffixLowest common ancestorMathematicsdescription
The Longest Common Factor (LCF) of a set of strings is a well studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences analysis. This problem has been solved by Hui (2000) who uses a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with use of suffix trees. A data structure for the LCA problem, although linear in space and construction time, introduces a multiplicative constant in both space and time that reduces the range of applications in many biological applications. In this article we present a new method for solving the LCF problem using the suffix tree structure with an auxiliary array that take space O(n). Our algorithm works in time O(nlog a), where n is the total input size and a is the multiplicative constant introduced by the alphabet. a is the size of the alphabet. We also consider a different version of our algorithm that applies to DAWGs. In this case, we prove that the algorithm works in both time and space proportional to data DAWG’s size.
year | journal | country | edition | language |
---|---|---|---|---|
2008-09-01 |