0000000000375501

AUTHOR

L. Pinello

showing 1 related works from this author

A new feature selection strategy for K-mers sequence representation

2014

DNA sequence decomposition into k-mers (substrings of length k) and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compute sequence comparison in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence classification. Moreover, the presence of possible n…

Settore INF/01 - Informaticak-mers DNA sequence similarity feature selection DNA sequence classification
researchProduct