Search results for "k-mers"

showing 4 items of 4 documents

Alignment Free Dissimilarities for Nucleosome Classification

2016

Epigenetic mechanisms such as nucleosome positioning, histone modifications and DNA methylation play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have shown a role of DNA sequences in recruitment of epigenetic regulators. For this reason, the use of more suitable similarities or dissimilarity between DNA sequences could help in the context of epigenetic studies. In particular, alignment-free dissimilarities have already been successfully applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles…

0301 basic medicineNearest neighbour classifiersKnn classifierSettore INF/01 - Informatica030102 biochemistry & molecular biologybiologyComputer scienceSpeech recognitionEpigeneticContext (language use)Computational biologyL-tuples03 medical and health sciences030104 developmental biologyHistoneSimilarity (network science)DNA methylationbiology.proteinNucleosomeEpigeneticsAlignment free DNA sequence dissimilaritiesk-mersNucleosome classificationEpigenomics

researchProduct

A new feature selection strategy for K-mers sequence representation

2014

DNA sequence decomposition into k-mers (substrings of length k) and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compute sequence comparison in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence classification. Moreover, the presence of possible n…

Settore INF/01 - Informaticak-mers DNA sequence similarity feature selection DNA sequence classification

researchProduct

A New Feature Selection Methodology for K-mers Representation of DNA Sequences

2015

DNA sequence decomposition into k-mers and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compare sequences in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of a fixed length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence analysis. Moreover, the presence of possible noisy features can also affect the…

k-mers DNA sequence similarity feature selection DNA sequence classification.Settore INF/01 - InformaticaComputer scienceSequence analysisbusiness.industryFeature vectorPattern recognitionFeature selectionDNA sequencingSubstringExponential functionArtificial intelligencebusinessAlgorithmTime complexity

researchProduct

Alignment free Dissimilarities for sequence classification

2015

One way to represent a DNA sequence is to break it down into substrings of length L, called L-tuples, and count the occurence of each L-tuple in the sequence. This representation defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length, that allows to measure sequence similarity in an alignment free way simply using disssimilarity functions between vectors. This work presents a benchmark study of 4 alignment free disssimilarity functions between sequences, computed on their L-tuples representation, for the purpose of sequence classification. In our experiments, we have tested the classes of geometric-based, correlation-based and information-based …

Settore INF/01 - Informaticak-mers L-tuples DNA sequence similarity DNA sequence classification Knn classifier

researchProduct