6533b86dfe1ef96bd12c96a3
RESEARCH PRODUCT
Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences
Antonino FiannacaMassimo La RosaAlfonso UrsoGiosuè Lo BoscoRiccardo Rizzosubject
0301 basic medicineSequenceSettore INF/01 - InformaticaEpigenomic030102 biochemistry & molecular biologybusiness.industryComputer scienceDeep learningPattern recognitionFeature selectionDNA sequencesNucleosomesRanking (information retrieval)Set (abstract data type)03 medical and health sciencesVariable (computer science)030104 developmental biologyDimension (vector space)Feature selectionDeep learning modelsArtificial intelligenceDeep learning models Feature selection DNA sequences Epigenomic NucleosomesRepresentation (mathematics)businessdescription
Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue effects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative k − mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classification by a deep learning network. Results computed on three public datasets show the effectiveness of the adopted feature selection method.
year | journal | country | edition | language |
---|---|---|---|---|
2018-01-01 |