Search results for "biological data"
showing 10 items of 53 documents
New Trends in Graph Mining
2010
Searching for repeated features characterizing biological data is fundamental in computational biology. When biological networks are under analysis, the presence of repeated modules across the same network (or several distinct ones) is shown to be very relevant. Indeed, several studies prove that biological networks can be often understood in terms of coalitions of basic repeated building blocks, often referred to as network motifs.This work provides a review of the main techniques proposed for motif extraction from biological networks. In particular, main intrinsic difficulties related to the problem are pointed out, along with solutions proposed in the literature to overcome them. Open ch…
Machine learning predictions of trophic status indicators and plankton dynamic in coastal lagoons
2018
Abstract Multivariate trophic indices provide an efficient way to assess and classify the eutrophication level and ecological status of a given water body, but their computation requires the availability of experimental information on many parameters, including biological data, that might not always be available. Here we show that machine learning techniques – once trained against a full data set – can be used to infer plankton biomass information from chemical and physical parameter only, so that trophic index can then be computed without using additional biological data. More specifically, we reconstruct plankton information from chemical and physical data, and this information together w…
The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration
2014
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a …
Multivariate analysis in the identification of biological targets for designed molecular structures: The BIOTA protocol
2013
In this work the new protocol BIOlogical Target Assignation (BIOTA) for the prediction of the biological target from molecular structures is proposed. BIOTA is based on the Principal Components Analysis (PCA) application on a matrix of ligands versus molecular descriptors. The application of BIOTA could allow to hypothesize the mechanism of action of a candidate drug prior to its biological evaluation or to repurpose old drugs. The protocol can be fine-tuned by choosing opportune targets (biological or not) and molecular descriptors, and it can be useful in every fields in with it is possible to collect set of compounds with known properties. The robustness of the protocol depends from diff…
Spatial Distribution of Fungal Communities in an Arable Soil.
2015
Fungi are prominent drivers of ecological processes in soils, so that fungal communities across different soil ecosystems have been well investigated. However, for arable soils taxonomically resolved fine-scale studies including vertical itemization of fungal communities are still missing. Here, we combined a cloning/Sanger sequencing approach of the ITS/LSU region as marker for general fungi and of the partial SSU region for arbuscular mycorrhizal fungi (AMF) to characterize the microbiome in different maize soil habitats. Four compartments were analyzed over two annual cycles 2009 and 2010: a) ploughed soil in 0-10 cm, b) rooted soil in 40-50 cm, c) root-free soil in 60-70 cm soil depth a…
A New Linear Initialization in SOM for Biomolecular Data
2009
In the past decade, the amount of data in biological field has become larger and larger; Bio-techniques for analysis of biological data have been developed and new tools have been introduced. Several computational methods are based on unsupervised neural network algorithms that are widely used for multiple purposes including clustering and visualization, i.e. the Self Organizing Maps (SOM). Unfortunately, even though this method is unsupervised, the performances in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. In this paper we present a new initialization technique based on a totally connected undirected graph, that report relat…
Genetic Diversity of O-Antigens in Hafnia alvei and the Development of a Suspension Array for Serotype Detection.
2016
Hafnia alvei is a facultative and rod-shaped gram-negative bacterium that belongs to the Enterobacteriaceae family. Although it has been more than 50 years since the genus was identified, very little is known about variations among Hafnia species. Diversity in O-antigens (O-polysaccharide, OPS) is thought to be a major factor in bacterial adaptation to different hosts and situations and variability in the environment. Antigenic variation is also an important factor in pathogenicity that has been used to define clones within a number of species. The genes that are required to synthesize OPS are always clustered within the bacterial chromosome. A serotyping scheme including 39 O-serotypes has…
A summary of genomic databases: overview and discussion
2009
In the last few years both the amount of electronically stored biological data and the number of biological data repositories grew up significantly (today, more than eight hundred can be counted thereof). In spite of the enormous amount of available resources, a user may be disoriented when he/she searches for specific data. Thus, the accurate analysis of biological data and repositories turn out to be useful to obtain a systematic view of biological database structures, tools and contents and, eventually, to facilitate the access and recovery of such data. In this chapter, we propose an analysis of genomic databases, which are databases of fundamental importance for the research in bioinfo…
Big Data in metagenomics: Apache Spark vs MPI.
2020
The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine. This has sparked significant interest in using modern Big Data technologies to process this large amount of information in distributed memory clusters of commodity hardware. Several approaches based on solutions such as Apache Hadoop or Apache Spark, have been proposed. These solutions allow developers to focus on the problem while the need to deal with low level details, such as data distribution schemes or communication patterns among processing nodes, can be ignored. However, performance and scalability are also of high importance when…
Comparative Mitogenomics of Leeches (Annelida: Clitellata): Genome Conservation and Placobdella-Specific trnD Gene Duplication.
2015
Mitochondrial DNA sequences, often in combination with nuclear markers and morphological data, are frequently used to unravel the phylogenetic relationships, population dynamics and biogeographic histories of a plethora of organisms. The information provided by examining complete mitochondrial genomes also enables investigation of other evolutionary events such as gene rearrangements, gene duplication and gene loss. Despite efforts to generate information to represent most of the currently recognized groups, some taxa are underrepresented in mitochondrial genomic databases. One such group is leeches (Annelida: Hirudinea: Clitellata). Herein, we expand our knowledge concerning leech mitochon…