6533b870fe1ef96bd12cfcf7

RESEARCH PRODUCT

Soft Topographic Map for Clustering and Classification of Bacteria

Massimo La RosaAlfonso UrsoGiuseppe Di FattaSalvatore GaglioGiovanni M. GiammancoRiccardo Rizzo

subject

Settore MED/07 - Microbiologia E Microbiologia Clinicatopographic mapComputer scienceClass (philosophy)GenomeAlgorithmsDatabase systemsDNAGenesTaxonomiestaxonomySimilarity (network science)Computer visionbacteriaCluster analysisGeneBioinformatichousekeeping geneSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - Informaticabusiness.industryBacterial taxonomyPattern recognitionGenomic Sequence ClusteringTopographic mapHousekeeping geneSettore ING-INF/06 - Bioingegneria Elettronica E InformaticaArtificial intelligencebusinessclustering

description

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and presents some singular cases potentially due to incorrect classification or erroneous annotations in the database. In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.

https://doi.org/10.1007/978-3-540-74825-0_30