6533b7d3fe1ef96bd1260af2

RESEARCH PRODUCT

A New Combinatorial Approach to Sequence Comparison

Marinella SciortinoGiovanna RosoneSabrina MantaciAntonio Restivo

subject

MultisetTheoretical computer scienceBurrows–Wheeler transformSettore INF/01 - InformaticaComputer scienceBurrows-Wheeler transform; Sequence comparisonString (computer science)Context (language use)Extension (predicate logic)ComparisonInformation theoryGenomeBurrows-Wheeler transform; ComparisonTheoretical Computer ScienceTransformation (function)CategorizationComputational Theory and MathematicsPhylogeneticsSequence comparisonTheory of computationBurrows-Wheeler TransformSequence ComparisonAlgorithmMathematicsData compression

description

In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences.

10.1007/s00224-007-9078-6http://hdl.handle.net/10447/13319