6533b86ffe1ef96bd12cd08a

RESEARCH PRODUCT

Sparse Dynamic Programming for Longest Common Subsequence from Fragments

Brenda S. BakerRaffaele Giancarlo

subject

Longest common subsequence problemCombinatoricsDynamic programmingSet (abstract data type)Computational MathematicsControl and OptimizationOptimization problemComputational Theory and MathematicsMatching (graph theory)Symbol (programming)ComputationSubstringMathematics

description

Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as computer science, computational biology, and speech recognition. We provide a new sparse dynamic programming technique that extends the Hunt?Szymanski paradigm for the computation of the longest common subsequence (LCS) and apply it to solve the LCS from Fragments problem: given a pair of strings X and Y (of length n and m, respectively) and a set M of matching substrings of X and Y, find the longest common subsequence based only on the symbol correspondences induced by the substrings. This problem arises in an application to analysis of software systems. Our algorithm solves the problem in O(|M|log|M|) time using balanced trees, or O(|M|loglogmin(|M|,nm/|M|)) time using Johnson's version of Flat Trees. These bounds apply for two cost measures. The algorithm can also be adapted to finding the usual LCS in O((m+n)log|?|+|M|log|M|) time using balanced trees or O((m+n)log|?|+|M|loglogmin(|M|,nm/|M|)) time using Johnson's version of Flat Trees, where M is the set of maximal matches between substrings of X and Y and ? is the alphabet. These bounds improve on those of the original Hunt?Szymanski algorithm while retaining the overall approach.

https://doi.org/10.1006/jagm.2002.1214