0000000000070721
AUTHOR
Antonino Fiannaca
"DEVELOPMENT OF A DECISION SUPPORT SYSTEM FOR BIOINFORMATICS. EXTRACTION OF PROTEIN COMPLEXES FROM A PROTEIN-PROTEIN INTERACTION NETWORK: A CASE STUDY"
Decision Support Systems and Workflow Management Systems have become essential tools for some business and scientific field. This thesis propose a new hybrid architecture for problem solving expertise and decision-making process, that aims to support high-quality research in the field of bioinformatics and system biology. The first part of the dissertation introduces the project to which belong this thesis work, i.e. the “Bioinformatics Organized Resources - an Intelligent System” (BORIS) project of the ICAR-CNR; the main goal of BORIS is to provide an helpful and effective support to researchers or experimentalist, that have no familiarity with tools and techniques to solve computational p…
Pelagic species identification by using a PNN neural network and echo-sounder data
For several years, a group of CNR researchers conducted acoustic surveys in the Sicily Channel to estimate the biomass of small pelagic species, their geographical distribution and their variations over time. The instrument used to carry out these surveys is the scientific echo-sounder, set for different frequencies. The processing of the back scattered signals in the volume of water under investigation determines the abundance of the species. These data are then correlated with the biological data of experimental catches, to attribute the composition of the various fish schools investigated. Of course, the recognition of the fish schools helps to produce very good results, that is very clo…
Simulated Annealing Technique for Fast Learning of SOM Networks
The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for multidimensional input datasets. In this paper, we present an application of the simulated annealing procedure to the SOM learning algorithm with the aim to obtain a fast learning and better performances in terms of quantization error. The proposed learning algorithm is called Fast Learning Self-Organized Map, and it does not affect the easiness of the basic learning algorithm of the standard SOM. The proposed learning algorithm also improves the quality of resulting maps by providing better clustering quality and topology preservation of input multi-dimensi…
Fast Training of Self Organizing Maps for the Visual Exploration of Molecular Compounds
Visual exploration of scientific data in life science\ud area is a growing research field due to the large amount of\ud available data. The Kohonen’s Self Organizing Map (SOM) is\ud a widely used tool for visualization of multidimensional data.\ud In this paper we present a fast learning algorithm for SOMs\ud that uses a simulated annealing method to adapt the learning\ud parameters. The algorithm has been adopted in a data analysis\ud framework for the generation of similarity maps. Such maps\ud provide an effective tool for the visual exploration of large and\ud multi-dimensional input spaces. The approach has been applied\ud to data generated during the High Throughput Screening\ud of mo…
An Intelligent System for Building Bioinformatics Workflows
In this paper a new intelligent system designed to support the researcher in the development of a workflow for bio informatics experiments is presented. The proposed system is capable to suggest one or more strategies in order to resolve the selected problem and to support the user in the assembly of a workflow for complex experiments, using a a Knowledge base, representing the expertise about the application domain, and a Rule-Based system for decision-making activity. Moreover, the system can represent this workflow at different abstraction layers, freeing the user from implementation details and assisting him in the correct configuration of the algorithms. A sample workflow for protein c…
Context-Aware Visual Exploration of Molecular Datab
Facilitating the visual exploration of scientific data has received increasing attention in the past decade or so. Especially in life science related application areas the amount of available data has grown at a breath taking pace. In this paper we describe an approach that allows for visual inspection of large collections of molecular compounds. In contrast to classical visualizations of such spaces we incorporate a specific focus of analysis, for example the outcome of a biological experiment such as high throughout screening results. The presented method uses this experimental data to select molecular fragments of the underlying molecules that have interesting properties and uses the res…
Identification of Key miRNAs in Regulation of PPI Networks
In this paper, we explore the interaction between miRNA and deregulated proteins in some pathologies. Assuming that miRNA can influence mRNA and consequently the proteins regulation, we explore this connection by using an interaction matrix derived from miRNA-target data and PPI network interactions. From this interaction matrix and the set of deregulated proteins, we search for the miRNA subset that influences the deregulated proteins with a minimum impact on the not deregulated ones. This regulation problem can be formulated as a complex optimization problem. In this paper, we have tried to solve it by using the Genetic Algorithm Heuristic. As the main result, we have found a set of miRNA…
Use of Soft Topographic Maps for Clustering Bacteria on the Basis of their 16S rRNA Gene Sequence
Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues
DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to t…
Impact of the flame retardant 2,2'4,4'-tetrabromodiphenyl ether (PBDE-47) in THP-1 macrophage-like cell function via small extracellular vesicles
2,2’4,4’-tetrabromodiphenyl ether (PBDE-47) is one of the most widespread environmental brominated flame-retardant congeners which has also been detected in animal and human tissues. Several studies have reported the effects of PBDEs on different health issues, including neurobehavioral and developmental disorders, reproductive health, and alterations of thyroid function. Much less is known about its immunotoxicity. The aim of our study was to investigate the effects that treatment of THP-1 macrophage-like cells with PBDE-47 could have on the content of small extracellular vesicles’ (sEVs) microRNA (miRNA) cargo and their downstream effects on bystander macrophages. To achieve this, we puri…
A Decision Support System for Reverse Engineering Gene Regulatory Networks
In this paper we present a knowledge-based system that aims at helping scientists in the reverse engineering process of gene regulatory networks. The main motivation of the proposed approach is to support scientists in the choice of the wide variety of algorithms and methods currently applied in the literature to infer Gene Regulatory Networks starting from gene expression measured using microarray technology. The Decision Support System (DSS) architecture is based on an ontology to model the knowledge base, a logical reasoner that builds the workflow of tasks to be done starting from the user’s request and a set of rules, and, finally, an agenda that runs the algorithms and software schedu…
Clustering Quality and Topology Preservation in Fast Learning SOMs
The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for data represented in multidimensional input spaces. In this paper, we describe Fast Learning SOM (FLSOM) which adopts a learning algorithm that improves the performance of the standard SOM with respect to the convergence time in the training phase. We show that FLSOM also improves the quality of the map by providing better clustering quality and topology preservation of multidimensional input data. Several tests have been carried out on different multidimensional datasets, which demonstrate better performances of the algorithm in comparison with the original …
BITS2019: the sixteenth annual meeting of the Italian society of bioinformatics.
AbstractThe 16th Annual Meeting of the Bioinformatics Italian Society was held in Palermo, Italy, on June 26-28, 2019. More than 80 scientific contributions were presented, including 4 keynote lectures, 31 oral communications and 49 posters. Also, three workshops were organised before and during the meeting. Full papers from some of the works presented in Palermo were submitted for this Supplement of BMC Bioinformatics. Here, we provide an overview of meeting aims and scope. We also shortly introduce selected papers that have been accepted for publication in this Supplement, for a complete presentation of the outcomes of the meeting.
Knowledge organization for modelling workflows in Taverna environment
Today Workflow Management Systems (WFMS), like Taverna and Kepler, have a very important place in the everyday work of the scientist. These tools support the access to computational resources and act as interface for building complex data processing chains. The next step is to support decisions of the researcher on autonomously developing workflow parts guided by requirements of the scientist while she/he is working on the high-level goal of the experiment. To this aim, it is necessary an ontology to store the knowledge related to the experiments and tools used, and to make this knowledge available not only to the scientist, but also to a suitable artificial intelligent system. In this pape…
A Proposed Knowledge Based Approach for Solving Proteomics Issues
In this paper we present a novel knowledge-based approach that aims at helping scientists to face and resolve a large number of proteomics problem. The system architecture is based on an ontology to model the knowledge base, a reasoner that starting from the user's request and a set of rules builds the workflow of tasks to be done, and an executor that runs the algorithms and software scheduled by the reasoner. The system can interact with the user showing him intermediate results and several options in order to refine the workflow and supporting him to choose among different forks. Thanks to the presence of the knowledge base and the modularity provided by the ontology, the system can be e…
Direct RNA nanopore sequencing of SARS-CoV-2 extracted from critical material from swabs
ABSTRACTBackgroundIn consideration of the increasing prevalence of COVID-19 cases in several countries and the resulting demand for unbiased sequencing approaches, we performed a direct RNA sequencing experiment using critical oropharyngeal swab samples collected from Italian patients infected with SARS-CoV-2 from the Palermo region in Sicily.MethodsHere, we identified the sequences SARS-CoV-2 directly in RNA extracted from critical samples using the Oxford Nanopore MinION technology without prior cDNA retro-transcription.ResultsUsing an appropriate bioinformatics pipeline, we could identify mutations in the nucleocapisid (N) gene, which have been reported previously in studies conducted in…
Improved SOM Learning using Simulated Annealing
Self-Organizing Map (SOM) algorithm has been extensively used for analysis and classification problems. For this kind of problems, datasets become more and more large and it is necessary to speed up the SOM learning. In this paper we present an application of the Simulated Annealing (SA) procedure to the SOM learning algorithm. The goal of the algorithm is to obtain fast learning and better performance in terms of matching of input data and regularity of the obtained map. An advantage of the proposed technique is that it preserves the simplicity of the basic algorithm. Several tests, carried out on different large datasets, demonstrate the effectiveness of the proposed algorithm in comparis…
T Cells Expressing Receptor Recombination/Revision Machinery Are Detected in the Tumor Microenvironment and Expanded in Genomically Over-unstable Models
AbstractTumors undergo dynamic immunoediting as part of a process that balances immunologic sensing of emerging neoantigens and evasion from immune responses. Tumor-infiltrating lymphocytes (TIL) comprise heterogeneous subsets of peripheral T cells characterized by diverse functional differentiation states and dependence on T-cell receptor (TCR) specificity gained through recombination events during their development. We hypothesized that within the tumor microenvironment (TME), an antigenic milieu and immunologic interface, tumor-infiltrating peripheral T cells could reexpress key elements of the TCR recombination machinery, namely, Rag1 and Rag2 recombinases and Tdt polymerase, as a poten…
A Deep Learning Model for Epigenomic Studies
Epigenetics is the study of heritable changes in gene expression that does not involve changes to the underlying DNA sequence, i.e. a change in phenotype not involved by a change in genotype. At least three main factor seems responsible for epigenetic change including DNA methylation, histone modification and non-coding RNA, each one sharing having the same property to affect the dynamic of the chromatin structure by acting on Nucleosomes posi- tion. A nucleosome is a DNA-histone complex, where around 150 base pairs of double-stranded DNA is wrapped. The role of nucleosomes is to pack the DNA into the nucleus of the Eukaryote cells, to form the Chromatin. Nucleosome positioning plays an imp…
A knowledge-based decision support system in bioinformatics: An application to protein complex extraction
Abstract Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowl…
An expert system hybrid architecture to support experiment management
Specific expert systems are used for supporting, speeding-up and adding precision to in silico experimentation in many domains. In particular, many experimentalists exhibit a growing interest in workflow management systems for making a pipeline of experiments. Unfortunately, these type of systems does not integrate a systematic approach or a support component for the workflow composition/reuse. For this reason, in this paper we propose a knowledge-based hybrid architecture for designing expert systems that are able to support experiment management. This architecture defines a reference cognitive space and a proper ontology that describe the state of a problem by means of three different per…
An ontological-based knowledge organization for bioinformatics workflow management system
Motivation and Objectives In the field of Computer Science, ontologies represent formal structures to define and organize knowledge of a specific application domain (Chandrasekaran et al., 1999). An ontology is composed of entities, called classes, and relationships among them. Classes are characterized by features, called attributes, and they can be arranged into a hierarchical organization. Ontologies are a fundamental instrument in Artificial Intelligence for the development of Knowledge-Based Systems (KBS). With its formal and well defined structure, in fact, an ontology provides a machine-understandable language that allows automatic reasoning for problems resolution. Typical KBS are E…
The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a …
ceRNA Network Regulation of TGF-β, WNT, FOXO, Hedgehog Pathways in the Pharynx of Ciona robusta
The transforming growth factor-β (TGF-β) family of cytokines performs a multifunctional signaling, which is integrated and coordinated in a signaling network that involves other pathways, such as Wintless, Forkhead box-O (FOXO) and Hedgehog and regulates pivotal functions related to cell fate in all tissues. In the hematopoietic system, TGF-β signaling controls a wide spectrum of biological processes, from immune system homeostasis to the quiescence and self-renewal of hematopoietic stem cells (HSCs). Recently an important role in post-transcription regulation has been attributed to two type of ncRNAs: microRNAs and pseudogenes. Ciona robusta, due to its philogenetic position close to verte…
A New Linear Initialization in SOM for Biomolecular Data
In the past decade, the amount of data in biological field has become larger and larger; Bio-techniques for analysis of biological data have been developed and new tools have been introduced. Several computational methods are based on unsupervised neural network algorithms that are widely used for multiple purposes including clustering and visualization, i.e. the Self Organizing Maps (SOM). Unfortunately, even though this method is unsupervised, the performances in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. In this paper we present a new initialization technique based on a totally connected undirected graph, that report relat…
Transcriptomic Analyses Reveal 2 and 4 Family Members of Cytochromes P450 (CYP) Involved in LPS Inflammatory Response in Pharynx of Ciona robusta
Cytochromes P450 (CYP) are enzymes responsible for the biotransformation of most endogenous and exogenous agents. The expression of each CYP is influenced by a unique combination of mechanisms and factors including genetic polymorphisms, induction by xenobiotics, and regulation by cytokines and hormones. In recent years, Ciona robusta, one of the closest living relatives of vertebrates, has become a model in various fields of biology, in particular for studying inflammatory response. Using an in vivo LPS exposure strategy, next-generation sequencing (NGS) and qRT-PCR combined with bioinformatics and in silico analyses, compared whole pharynx transcripts from naïve and LPS-exposed C. robusta…
Deep learning models for bacteria taxonomic classification of metagenomic data.
Background An open challenge in translational bioinformatics is the analysis of sequenced metagenomes from various environmental samples. Of course, several studies demonstrated the 16S ribosomal RNA could be considered as a barcode for bacteria classification at the genus level, but till now it is hard to identify the correct composition of metagenomic data from RNA-seq short-read data. 16S short-read data are generated using two next generation sequencing technologies, i.e. whole genome shotgun (WGS) and amplicon (AMP); typically, the former is filtered to obtain short-reads belonging to a 16S shotgun (SG), whereas the latter take into account only some specific 16S hypervariable regions.…
A New SOM Initialization Algorithm for Nonvectorial Data
Self Organizing Maps (SOMs) are widely used mapping and clustering algorithms family. It is also well known that the performances of the maps in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. This drawback is common to all the SOM algorithms, and critical for a new SOM algorithm, the Median SOM (M-SOM), developed in order to map datasets characterized by a dissimilarity matrix. In this paper an initialization technique of M-SOM is proposed and compared to the initialization techniques proposed in the original paper. The results show that the proposed initialization technique assures faster learning and better performance in terms…
A Knowledge Based Decision Support System for Bioinformatics and System Biology
In this paper, we present a new Decision Support System for Bioinformatics and System Biology issues. Our system is based on a Knowledge base, representing the expertise about the application domain, and a Reasoner. The Reasoner, consulting the Knowledge base and according to the user’s request, is able to suggest one or more strategies in order to resolve the selected problem. Moreover, the system can build, at different abstraction layers, a workflow for the current problem on the basis of the user’s choices, freeing the user from implementation details and assisting him in the correct configuration of the algorithms. Two possible application scenarios will be introduced: the analysis of …
Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences
Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue effects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative k − mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classifi…
Transcriptomic and Bioinformatic Analyses Identifying a Central Mif-Cop9-Nf-kB Signaling Network in Innate Immunity Response of Ciona robusta
The Ascidian C. robusta is a powerful model for studying innate immunity. LPS induction activates inflammatory-like reactions in the pharynx and the expression of several innate immune genes in granulocyte hemocytes such as cytokines, for instance, macrophage migration inhibitory factors (CrMifs). This leads to intracellular signaling involving the Nf-kB signaling cascade that triggers downstream pro-inflammatory gene expression. In mammals, the COP9 (Constitutive photomorphogenesis 9) signalosome (CSN) complex also results in the activation of the NF-kB pathway. It is a highly conserved complex in vertebrates, mainly engaged in proteasome degradation which is essential for maintaining proc…
Direct RNA Nanopore Sequencing of SARS-CoV-2 Extracted from Critical Material from Swabs
In consideration of the increasing prevalence of COVID-19 cases in several countries and the resulting demand for unbiased sequencing approaches, we performed a direct RNA sequencing (direct RNA seq.) experiment using critical oropharyngeal swab samples collected from Italian patients infected with SARS-CoV-2 from the Palermo region in Sicily. Here, we identified the sequences SARS-CoV-2 directly in RNA extracted from critical samples using the Oxford Nanopore MinION technology without prior cDNA retrotranscription. Using an appropriate bioinformatics pipeline, we could identify mutations in the nucleocapsid (N) gene, which have been reported previously in studies conducted in other countri…
Additional file 1 of Deep learning models for bacteria taxonomic classification of metagenomic data
Preliminary classification results. Preliminary classification results obtained training a model with a kind of input data, e.g. SG, and testing it with the other type of input data, e.g. AMP. (XLSX 9.52 kb)