6533b871fe1ef96bd12d11fb

RESEARCH PRODUCT

Comparison of genomic sequences clustering using Normalized Compression Distance and Evolutionary Distance

Alfonso UrsoMassimo La RosaSalvatore GaglioRiccardo Rizzo

subject

Kolmogorov complexityuniversal similarity metricComputer sciencebusiness.industryDNA sequencePattern recognitionGenomic Sequence ClusteringCompression (functional analysis)Normalized compression distanceArtificial intelligenceCluster analysisbusinessDistance matrices in phylogenyclustering

description

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a long procedure and the obtained dissimilarity results is not a metric. Recently the normalized compression distance was introduced as a method to calculate the distance between two generic digital objects, and it seems a suitable way to compare genomic strings. In this paper the clustering and the mapping, obtained using a SOM, with the traditional evolutionary distance and the compression distance are compared in order to understand if the two distances sets are similar. The first results indicate that the two distances catch different aspects of the genomic sequences and further investigations are needed to obtain a definitive result.

http://hdl.handle.net/10447/48454