0000000001257621

AUTHOR

D. La Neve

showing 1 related works from this author

Alignment free Dissimilarities for sequence classification

2015

One way to represent a DNA sequence is to break it down into substrings of length L, called L-tuples, and count the occurence of each L-tuple in the sequence. This representation defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length, that allows to measure sequence similarity in an alignment free way simply using disssimilarity functions between vectors. This work presents a benchmark study of 4 alignment free disssimilarity functions between sequences, computed on their L-tuples representation, for the purpose of sequence classification. In our experiments, we have tested the classes of geometric-based, correlation-based and information-based …

Settore INF/01 - Informaticak-mers L-tuples DNA sequence similarity DNA sequence classification Knn classifier
researchProduct