6533b853fe1ef96bd12ad462

RESEARCH PRODUCT

Longest Common Subsequence from Fragments via Sparse Dynamic Programming

Brenda S. BakerRaffaele Giancarlo

subject

Dynamic programmingCombinatoricsSet (abstract data type)Longest common subsequence problemOptimization problemMatching (graph theory)Combinatorial optimizationData structureSubstringMathematics

description

Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as Computer Science, Computational Biology and Speech Recognition [7,11,15]. We provide a new Sparse Dynamic Programming technique that extends the Hunt-Szymanski [2,9,8] paradigm for the computation of the Longest Common Subsequence (LCS) and apply it to solve the LCS from Fragments problem: given a pair of strings X and Y (of length n and m, resp.) and a set M of matching substrings of X and Y, find the longest common subsequence based only on the symbol correspondences induced by the substrings. This problem arises in an application to analysis of software systems. Our algorithm solves the problem in O(|M| log |M|) time using balanced trees, or O(|M| log logmin(|M|;nm=|M|)) time using Johnson's version of Flat Trees [10]. These bounds apply for two cost measures. The algorithm can also be adapted to finding the usual LCS in O((m + n) log |Σ| + |M| log |M|) using balanced trees or O((m + n) log |Σ| + |M| log logmin(|M|; nm=|M|)) using Johnson's Flat Trees, where M is the set of maximal matches between substrings of X and Y and Σ is the alphabet.

https://doi.org/10.1007/3-540-68530-8_7