0000000000483926

AUTHOR

Riccardo Rizzo

Semantics driven interaction using natural language in students tutoring

The aim of this work is to introduce a semantic integration between an ontology and a chatbot in an Intelligent Tutoring Systems (ITS) to interact with students using natural language. The interaction process is driven by the use of a purposely defined ontology. In the ontology two types of conceptual relations are defined. Besides the usual relations, which are used to define the domain's structure, another type of relation is used to define the navigation schema inside the ontology according to the need of managing uncertainty. Uncertainty level is related to student knowledge level about the involved concepts. In this work we propose an ITS for the Java programming language called TutorJ…

research product

Identifying small pelagic Mediterranean fish schools from acoustic and environmental data using optimized artificial neural networks

Abstract The Common Fisheries Policy of the European Union aims to exploit fish stocks at a level of Maximum Sustainable Yield by 2020 at the latest. At the Mediterranean level, the General Fisheries Commission for the Mediterranean (GFCM) has highlighted the importance of reversing the observed declining trend of fish stocks. In this complex context, it is important to obtain reliable biomass estimates to support scientifically sound advice for sustainable management of marine resources. This paper presents a machine learning methodology for the classification of pelagic species schools from acoustic and environmental data. In particular, the methodology was tuned for the recognition of an…

research product

Pelagic species identification by using a PNN neural network and echo-sounder data

For several years, a group of CNR researchers conducted acoustic surveys in the Sicily Channel to estimate the biomass of small pelagic species, their geographical distribution and their variations over time. The instrument used to carry out these surveys is the scientific echo-sounder, set for different frequencies. The processing of the back scattered signals in the volume of water under investigation determines the abundance of the species. These data are then correlated with the biological data of experimental catches, to attribute the composition of the various fish schools investigated. Of course, the recognition of the fish schools helps to produce very good results, that is very clo…

research product

Simulated Annealing Technique for Fast Learning of SOM Networks

The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for multidimensional input datasets. In this paper, we present an application of the simulated annealing procedure to the SOM learning algorithm with the aim to obtain a fast learning and better performances in terms of quantization error. The proposed learning algorithm is called Fast Learning Self-Organized Map, and it does not affect the easiness of the basic learning algorithm of the standard SOM. The proposed learning algorithm also improves the quality of resulting maps by providing better clustering quality and topology preservation of input multi-dimensi…

research product

Fast Training of Self Organizing Maps for the Visual Exploration of Molecular Compounds

Visual exploration of scientific data in life science\ud area is a growing research field due to the large amount of\ud available data. The Kohonen’s Self Organizing Map (SOM) is\ud a widely used tool for visualization of multidimensional data.\ud In this paper we present a fast learning algorithm for SOMs\ud that uses a simulated annealing method to adapt the learning\ud parameters. The algorithm has been adopted in a data analysis\ud framework for the generation of similarity maps. Such maps\ud provide an effective tool for the visual exploration of large and\ud multi-dimensional input spaces. The approach has been applied\ud to data generated during the High Throughput Screening\ud of mo…

research product

An Intelligent System for Building Bioinformatics Workflows

In this paper a new intelligent system designed to support the researcher in the development of a workflow for bio informatics experiments is presented. The proposed system is capable to suggest one or more strategies in order to resolve the selected problem and to support the user in the assembly of a workflow for complex experiments, using a a Knowledge base, representing the expertise about the application domain, and a Rule-Based system for decision-making activity. Moreover, the system can represent this workflow at different abstraction layers, freeing the user from implementation details and assisting him in the correct configuration of the algorithms. A sample workflow for protein c…

research product

Context-Aware Visual Exploration of Molecular Datab

Facilitating the visual exploration of scientific data has received increasing attention in the past decade or so. Especially in life science related application areas the amount of available data has grown at a breath taking pace. In this paper we describe an approach that allows for visual inspection of large collections of molecular compounds. In contrast to classical visualizations of such spaces we incorporate a specific focus of analysis, for example the outcome of a biological experiment such as high throughout screening results. The presented method uses this experimental data to select molecular fragments of the underlying molecules that have interesting properties and uses the res…

research product

Learning Path Generation by Domain Ontology Transformation

An approach to automated learning path generation inside a domain ontology supporting a web tutoring system is presented. Even if a terminological ontology definition is needed in real systems to enable reasoning and/or planning techniques, and to take into account the modern learning theories, the task to apply a planner to such an ontology is very hard because the definition of actions along with their preconditions and effects has to take into account the semantics of the relations among concepts, and it results in building an ontology of learning. The proposed methodology is inspired to the Knowledge Space Theory, and proposes some heuristics to transform the original ontology in a weig…

research product

Deep learning architectures for prediction of nucleosome positioning from sequences data

Abstract Background Nucleosomes are DNA-histone complex, each wrapping about 150 pairs of double-stranded DNA. Their function is fundamental for one of the primary functions of Chromatin i.e. packing the DNA into the nucleus of the Eukaryote cells. Several biological studies have shown that the nucleosome positioning influences the regulation of cell type-specific gene activities. Moreover, computational studies have shown evidence of sequence specificity concerning the DNA fragment wrapped into nucleosomes, clearly underlined by the organization of particular DNA substrings. As the main consequence, the identification of nucleosomes on a genomic scale has been successfully performed by com…

research product

Identification of Key miRNAs in Regulation of PPI Networks

In this paper, we explore the interaction between miRNA and deregulated proteins in some pathologies. Assuming that miRNA can influence mRNA and consequently the proteins regulation, we explore this connection by using an interaction matrix derived from miRNA-target data and PPI network interactions. From this interaction matrix and the set of deregulated proteins, we search for the miRNA subset that influences the deregulated proteins with a minimum impact on the not deregulated ones. This regulation problem can be formulated as a complex optimization problem. In this paper, we have tried to solve it by using the Genetic Algorithm Heuristic. As the main result, we have found a set of miRNA…

research product

A framework for sign language sentence recognition by common sense context

This correspondence proposes a complete framework for sign language recognition that integrates a commonsense engine in order to deal with sentence recognition. The proposed system is based on a multilevel architecture that allows modeling and managing of the knowledge of the recognition process in a simple and robust way. The final abstraction level of this architecture introduces the semantic context and the analysis of the correctness of a sentence given in a sequence of recognized signs. Experimentations are presented using a set of signs from the Italian sign language (LIS) for domotic applications. The implemented system maintains a high recognition rate when the set of signs grows, c…

research product

Deep Metric Learning for Transparent Classification of Covid-19 X-Ray Images

This work proposes an interpretable classifier for automatic Covid-19 classification using chest X-ray images. It is based on a deep learning model, in particular, a triplet network, devoted to finding an effective image embedding. Such embedding is a non-linear projection of the images into a space of reduced dimension, where homogeneity and separation of the classes measured by a predefined metric are improved. A K-Nearest Neighbor classifier is the interpretable model used for the final classification. Results on public datasets show that the proposed methodology can reach comparable results with state of the art in terms of accuracy, with the advantage of providing interpretability to t…

research product

An Application of Spike-Timing-Dependent Plasticity to Readout Circuit for Liquid State Machine

Liquid state machine (LSM) is a neural system based on spiking neurons that implements a mapping between functions of time. A typical application of LSM is classification of time functions obtained observing the state of the liquid by using a memoryless readout circuit, usually implemented by a linear perceptron. Due to the high number of neurons in the liquid the training of the readout is difficult. In this paper we show that using the Spike-Timing-Dependent Plasticity (STDP) a single neuron with short training session can be used to recognize the state of the liquid due to an input signal. Using STDP it is possible to identify the spikes timing of the neurons in the liquid and this allow…

research product

Artificial neural networks for fault tollerance of an air-pressure sensor network

A meteorological tsunami, commonly called Meteotsunami, is a tsunami-like wave originated by rapid changes in barometric pressure that involve the displacement of a body of water. This phenomenon is usually present in the sea cost area of Mazara del Vallo (Sicily, Italy), in particular in the internal part of the seaport canal, sometimes making local population at risk. The Institute for Coastal Marine Environment (IAMC) of the National Research Council in Italy (CNR) have already conducted several studies upon meteotsunami phenomenon. One of the project has regarded the creation of a sensors network composed by micro-barometric sensors, located in 4 different stations close to the seaport …

research product

Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues

DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to t…

research product

An Ontology Design Methodology for Knowledge-Based Systems with Application to Bioinformatics

Ontologies are formal knowledge representation models. Knowledge organization is a fundamental requirement in order to develop Knowledge-Based systems. In this paper we present Data-Problem-Solver (DPS) approach, a new ontological paradigm that allows the knowledge designer to model and represent a Knowledge Base (KB) for expert systems. Our approach clearly distinguishes among the knowledge about a problem to resolve (answering the what to do question), the solver method to resolve it (answering the how to do question) and the type of input data required (answering the what I need question). The main purpose of the proposed paradigm is to facilitate the generalization of the application do…

research product

Evolving Tree Algorithm Modifications

There are many variants of the original self-organizing neural map algorithm proposed by Kohonen. One of the most recent is the Evolving Tree, a tree-shaped self-organizing network which has many interesting characteristics. This network builds a tree structure splitting the input dataset during learning. This paper presents a speed-up modification of the original training algorithm useful when the Evolving Tree network is used with complex data as images or video. After a measurement of the effectiveness an application of the modified algorithm in image segmentation is presented.

research product

Discovering learning paths on a domain ontology using natural language interaction

The present work investigates the problem of determining a learning path inside a suitable domain ontology. The proposed approach enables the user of a web learning application to interact with the system using natural language in order to browse the ontology itself. The course related knowledge is arranged as a three level hierarchy: content level, symbolic level, and conceptual level bridging the previous ones. The implementation of the ontological, the interaction, and the presentation component inside the TutorJ system is explained, and the first results are presented.

research product

Clustering Quality and Topology Preservation in Fast Learning SOMs

The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for data represented in multidimensional input spaces. In this paper, we describe Fast Learning SOM (FLSOM) which adopts a learning algorithm that improves the performance of the standard SOM with respect to the convergence time in the training phase. We show that FLSOM also improves the quality of the map by providing better clustering quality and topology preservation of multidimensional input data. Several tests have been carried out on different multidimensional datasets, which demonstrate better performances of the algorithm in comparison with the original …

research product

Pattern Classification from Multi-beam Acoustic Data Acquired in Kongsfjorden

Climate change is causing a structural change in Arctic ecosystems, decreasing the effectiveness that the polar regions have in cooling water masses, with inevitable repercussions on the climate and with an impact on marine biodiversity. The Svalbard islands under study are an area greatly influenced by Atlantic waters. This area is undergoing changes that are modifying the composition and distribution of the species present. The aim of this work is to provide a method for the classification of acoustic patterns acquired in the Kongsfjorden, Svalbard, Arctic Circle using multibeam technology. Therefore the general objective is the implementation of a methodology useful for identifying the a…

research product

BITS2019: the sixteenth annual meeting of the Italian society of bioinformatics.

AbstractThe 16th Annual Meeting of the Bioinformatics Italian Society was held in Palermo, Italy, on June 26-28, 2019. More than 80 scientific contributions were presented, including 4 keynote lectures, 31 oral communications and 49 posters. Also, three workshops were organised before and during the meeting. Full papers from some of the works presented in Palermo were submitted for this Supplement of BMC Bioinformatics. Here, we provide an overview of meeting aims and scope. We also shortly introduce selected papers that have been accepted for publication in this Supplement, for a complete presentation of the outcomes of the meeting.

research product

A Comparison between Habituation and Conscience mechanism in Self–Organizing Maps

In this letter, a preliminary study of habituation in self-organizing networks is reported. The habituation model implemented allows us to obtain a faster learning process and better clustering performances. The liabituable neuron is a generalization of the typical neuron and can be used in many self-organizing network models. The habituation mechanism is implemented in a SOM and the clustering performances of the network are compared to the conscience learning mechanism that follows roughly the same principle but is less sophisticated.

research product

Automatic classification of acoustically detected krill aggregations: A case study from Southern Ocean

Acoustic surveys represent the standard methodology to assess the spatial distribution and abundance of pelagic organisms characterized by aggregative behaviour. The species identification of acoustically observed aggregations is usually performed by taking into account the biological sampling and according to expert-based knowledge. The precision of survey estimates, such as total abundance and spatial distribution, strongly depends on the efficiency of acoustic and biological sampling as well as on the species identification. In this context, the automatic identification of specific groups based on energetic and morphological features could improve the species identification process, allo…

research product

Knowledge organization for modelling workflows in Taverna environment

Today Workflow Management Systems (WFMS), like Taverna and Kepler, have a very important place in the everyday work of the scientist. These tools support the access to computational resources and act as interface for building complex data processing chains. The next step is to support decisions of the researcher on autonomously developing workflow parts guided by requirements of the scientist while she/he is working on the high-level goal of the experiment. To this aim, it is necessary an ontology to store the knowledge related to the experiments and tools used, and to make this knowledge available not only to the scientist, but also to a suitable artificial intelligent system. In this pape…

research product

Unsupervised Classification of Acoustic Echoes from Two Krill Species in the Southern Ocean (Ross Sea)

This work presents a computational methodology able to automatically classify the echoes of two krill species recorded in the Ross sea employing scientific echo-sounder at three different frequencies (38, 120 and 200 kHz). The goal of classifying the gregarious species represents a time-consuming task and is accomplished by using differences and/or thresholds estimated on the energy features of the insonified targets. Conversely, our methodology takes into account energy, morphological and depth features of echo data, acquired at different frequencies. Internal validation indices of clustering were used to verify the ability of the clustering in recognizing the correct number of species. Th…

research product

A KST-BASED SYSTEM FOR STUDENT TUTORING

Abstract: A novel assessment procedure based on knowledge space theory (KST) is presented along with a complete implementation of an intelligent tutoring system. (ITS) that has been used to test our theoretical findings. The key idea is that correct assessment of the student knowledge is strictly related to the structure of the domain ontology. Suitable relationships between the concepts must be present to allow the creation of a reverse path from the "knowledge state" representing the student goal to the one that contains her actual knowledge about this topic. Knowledge space theory is a very good framework to guide the process of building the ontology used, by the artificial tutor The sys…

research product

A Proposed Knowledge Based Approach for Solving Proteomics Issues

In this paper we present a novel knowledge-based approach that aims at helping scientists to face and resolve a large number of proteomics problem. The system architecture is based on an ontology to model the knowledge base, a reasoner that starting from the user's request and a set of rules builds the workflow of tasks to be done, and an executor that runs the algorithms and software scheduled by the reasoner. The system can interact with the user showing him intermediate results and several options in order to refine the workflow and supporting him to choose among different forks. Thanks to the presence of the knowledge base and the modularity provided by the ontology, the system can be e…

research product

Deep learning network for exploiting positional information in nucleosome related sequences

A nucleosome is a DNA-histone complex, wrapping about 150 pairs of double-stranded DNA. The role of nucleosomes is to pack the DNA into the nucleus of the Eukaryote cells to form the Chromatin. Nucleosome positioning genome wide play an important role in the regulation of cell type-specific gene activities. Several biological studies have shown sequence specificity of nucleosome presence, clearly underlined by the organization of precise nucleotides substrings. Taking into consideration such advances, the identification of nucleosomes on a genomic scale has been successfully performed by DNA sequence features representation and classical supervised classification methods such as Support Vec…

research product

A system for sign language sentence recognition based on common sense context

The paper proposes a complete framework for sign language recognition that integrates common sense in order to deal with sentences. The proposed system is based on a cognitive architecture allows modeling and managing the knowledge of the recognition process in a simple and robust way. The final abstraction level of this architecture introduces the semantic context and the analysis of the correctness of a sentence given a sequence of recognized signs. Experimentations are presented using the Italian sign language (LIS), and shows that the system maintains the recognition rate high when set of sign grows, correcting erroneous recognized single sign using the context

research product

Improved SOM Learning using Simulated Annealing

Self-Organizing Map (SOM) algorithm has been extensively used for analysis and classification problems. For this kind of problems, datasets become more and more large and it is necessary to speed up the SOM learning. In this paper we present an application of the Simulated Annealing (SA) procedure to the SOM learning algorithm. The goal of the algorithm is to obtain fast learning and better performance in terms of matching of input data and regularity of the obtained map. An advantage of the proposed technique is that it preserves the simplicity of the basic algorithm. Several tests, carried out on different large datasets, demonstrate the effectiveness of the proposed algorithm in comparis…

research product

Clustering Bacteria Species Using Neural Gas: Preliminary Study

In this work a method for clustering and visualization of bacteria taxonomy is presented. A modified version of the Batch Median Neural Gas (BNG) algorithm is proposed. The BNG algorithm is able to manage non vectorial data given as a dissimilarity matrix. We tested the modified BNG on the dissimilarity matrix obtained from sequences alignment and computing distances using bacteria genotype information regarding the16S rRNA housekeeping gene, which represents a stable part of bacteria genome. The dataset used for the experiments is obtained from the Ribosomal Database Project II, and it is made of 5159 sequences of 16S rRNA genes. Preliminary results of the experiments show a promising abil…

research product

A Deep Learning Model for Epigenomic Studies

Epigenetics is the study of heritable changes in gene expression that does not involve changes to the underlying DNA sequence, i.e. a change in phenotype not involved by a change in genotype. At least three main factor seems responsible for epigenetic change including DNA methylation, histone modification and non-coding RNA, each one sharing having the same property to affect the dynamic of the chromatin structure by acting on Nucleosomes posi- tion. A nucleosome is a DNA-histone complex, where around 150 base pairs of double-stranded DNA is wrapped. The role of nucleosomes is to pack the DNA into the nucleus of the Eukaryote cells, to form the Chromatin. Nucleosome positioning plays an imp…

research product

A knowledge-based decision support system in bioinformatics: An application to protein complex extraction

Abstract Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowl…

research product

A pattern recognition approach to identify biological clusters acquired by acoustic multi-beam in Kongsfjorden

The Svalbardsis one of the most intensively studied marine regions in the Artic; here the composition and distribution of marine assemblages are changing under the effect of global change, and marine communities are monitored in order to understand the long-term effects on marine biodiversity. In the present work, acoustic data collected in the Kongsfjorden using multi-beam technology was analyzed to develop a methodology for identifying and classifying 3D acoustic patterns related to fish aggregations. In particular, morphological, energetic and depth features were taken into account to develop a multi-variate classification procedure allowing to discriminate fish species. The results obta…

research product

Recurrent Deep Neural Networks for Nucleosome Classification

Nucleosomes are the fundamental repeating unit of chromatin. A nucleosome is an 8 histone proteins complex, in which approximately 147–150 pairs of DNA bases bind. Several biological studies have clearly stated that the regulation of cell type-specific gene activities are influenced by nucleosome positioning. Bioinformatic studies have improved those results showing proof of sequence specificity in nucleosomes’ DNA fragment. In this work, we present a recurrent neural network that uses nucleosome sequence features representation for their classification. In particular, we implement an architecture which stacks convolutional and long short-term memory layers, with the main purpose to avoid t…

research product

Experiences with CiceRobot, a Museum Guide Cognitive Robot

The paper describes CiceRobot, a robot based on a cognitive architecture for robot vision and action. The aim of the architecture is to integrate visual perception and actions with knowledge representation, in order to let the robot to generate a deep inner understanding of its environment. The principled integration of perception, action and of symbolic knowledge is based on the introduction of an intermediate representation based on Gardenfors conceptual spaces. The architecture has been tested on a RWI B21 autonomous robot on tasks related with guided tours in the Archaeological Museum of Agrigento. Experimental results are presented.

research product

Normalised compression distance and evolutionary distance of genomic sequences: comparison of clustering results

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a time consuming procedure and the obtained dissimilarity results is not a metric. Recently, the normalised compression distance was introduced as a method to calculate the distance between two generic digital objects and it seems a suitable way to compare genomic strings. In this paper, the clustering and the non-linear mapping obtained using the evolutionary distance and the compression distance are compared, in order to understand if the two distances sets are similar.

research product

An expert system hybrid architecture to support experiment management

Specific expert systems are used for supporting, speeding-up and adding precision to in silico experimentation in many domains. In particular, many experimentalists exhibit a growing interest in workflow management systems for making a pipeline of experiments. Unfortunately, these type of systems does not integrate a systematic approach or a support component for the workflow composition/reuse. For this reason, in this paper we propose a knowledge-based hybrid architecture for designing expert systems that are able to support experiment management. This architecture defines a reference cognitive space and a proper ontology that describe the state of a problem by means of three different per…

research product

An ontological-based knowledge organization for bioinformatics workflow management system

Motivation and Objectives In the field of Computer Science, ontologies represent formal structures to define and organize knowledge of a specific application domain (Chandrasekaran et al., 1999). An ontology is composed of entities, called classes, and relationships among them. Classes are characterized by features, called attributes, and they can be arranged into a hierarchical organization. Ontologies are a fundamental instrument in Artificial Intelligence for the development of Knowledge-Based Systems (KBS). With its formal and well defined structure, in fact, an ontology provides a machine-understandable language that allows automatic reasoning for problems resolution. Typical KBS are E…

research product

The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a …

research product

A New Linear Initialization in SOM for Biomolecular Data

In the past decade, the amount of data in biological field has become larger and larger; Bio-techniques for analysis of biological data have been developed and new tools have been introduced. Several computational methods are based on unsupervised neural network algorithms that are widely used for multiple purposes including clustering and visualization, i.e. the Self Organizing Maps (SOM). Unfortunately, even though this method is unsupervised, the performances in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. In this paper we present a new initialization technique based on a totally connected undirected graph, that report relat…

research product

Deep learning models for bacteria taxonomic classification of metagenomic data.

Background An open challenge in translational bioinformatics is the analysis of sequenced metagenomes from various environmental samples. Of course, several studies demonstrated the 16S ribosomal RNA could be considered as a barcode for bacteria classification at the genus level, but till now it is hard to identify the correct composition of metagenomic data from RNA-seq short-read data. 16S short-read data are generated using two next generation sequencing technologies, i.e. whole genome shotgun (WGS) and amplicon (AMP); typically, the former is filtered to obtain short-reads belonging to a 16S shotgun (SG), whereas the latter take into account only some specific 16S hypervariable regions.…

research product

A New SOM Initialization Algorithm for Nonvectorial Data

Self Organizing Maps (SOMs) are widely used mapping and clustering algorithms family. It is also well known that the performances of the maps in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. This drawback is common to all the SOM algorithms, and critical for a new SOM algorithm, the Median SOM (M-SOM), developed in order to map datasets characterized by a dissimilarity matrix. In this paper an initialization technique of M-SOM is proposed and compared to the initialization techniques proposed in the original paper. The results show that the proposed initialization technique assures faster learning and better performance in terms…

research product

A Knowledge Based Decision Support System for Bioinformatics and System Biology

In this paper, we present a new Decision Support System for Bioinformatics and System Biology issues. Our system is based on a Knowledge base, representing the expertise about the application domain, and a Reasoner. The Reasoner, consulting the Knowledge base and according to the user’s request, is able to suggest one or more strategies in order to resolve the selected problem. Moreover, the system can build, at different abstraction layers, a workflow for the current problem on the basis of the user’s choices, freeing the user from implementation details and assisting him in the correct configuration of the algorithms. Two possible application scenarios will be introduced: the analysis of …

research product

Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences

Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue effects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative k − mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classifi…

research product

Deep Metric Learning for Histopathological Image Classification

Neural networks demonstrated to be effective in multiple classification tasks with performances that are similar to human capabilities. Notwithstanding, the viability of the application of this kind of tool in real cases passes through the possibility to interpret the provided results and let the human operator take his decision according to the information that is provided. This aspect is much more evident when the field of application is bound to people's health as for biomed-ical image classification. We propose for the classification of histopathological images a convolutional neural network that, through metric learning, learns a representation that gathers in homogeneous clusters the …

research product

CORENup: a combination of convolutional and recurrent deep neural networks for nucleosome positioning identification

Abstract Background Nucleosomes wrap the DNA into the nucleus of the Eukaryote cell and regulate its transcription phase. Several studies indicate that nucleosomes are determined by the combined effects of several factors, including DNA sequence organization. Interestingly, the identification of nucleosomes on a genomic scale has been successfully performed by computational methods using DNA sequence as input data. Results In this work, we propose CORENup, a deep learning model for nucleosome identification. CORENup processes a DNA sequence as input using one-hot representation and combines in a parallel fashion a fully convolutional neural network and a recurrent layer. These two parallel …

research product

Bacteria classification using minimal absent words

Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the com…

research product

Soft Topographic Map for Clustering and Classification of Bacteria

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA…

research product

Comparison of genomic sequences clustering using Normalized Compression Distance and Evolutionary Distance

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a long procedure and the obtained dissimilarity results is not a metric. Recently the normalized compression distance was introduced as a method to calculate the distance between two generic digital objects, and it seems a suitable way to compare genomic strings. In this paper the clustering and the mapping, obtained using a SOM, with the traditional evolutionary distance and the compression distance are compared in order to understand if the two distances sets are similar. The first results indicate that the two distances catch differen…

research product

Topographic maps for clustering and fast identification of bacteria using 16s housekeeping gene

In microbial identification the standard method to attribute a specific name to a bacterial isolate relays on the comparison of morphologic and phenotypic characters to those described for type or typical strains. In the last years a new standard for identifying bacteria using genotypic information began to be developed. In this new approach phylogenetic relationships of bacteria could be determined by comparing a stable part of the bacteria genetic code, the so called "housekeeping genes". The most commonly used gene for taxonomic purposes for bacteria is the 16S rRNA. The goal of this chapter is to show that genotypic features can be used to build a topographic map for clustering of a lar…

research product

Additional file 1 of Deep learning models for bacteria taxonomic classification of metagenomic data

Preliminary classification results. Preliminary classification results obtained training a model with a kind of input data, e.g. SG, and testing it with the other type of input data, e.g. AMP. (XLSX 9.52 kb)

research product