Search results for "Data type"
showing 10 items of 1183 documents
Dry selection and wet evaluation for the rational discovery of new anthelmintics
2017
Helminths infections remain a major problem in medical and public health. In this report, atom-based 2D bilinear indices, a TOMOCOMD-CARDD (QuBiLs-MAS module) molecular descriptor family and linear discriminant analysis (LDA) were used to find models that differentiate among anthelmintic and non-anthelmintic compounds. Two classification models obtained by using non-stochastic and stochastic 2D bilinear indices, classified correctly 86.64% and 84.66%, respectively, in the training set. Equation 1(2) correctly classified 141(135) out of 165 [85.45%(81.82%)] compounds in external validation set. Another LDA models were performed in order to get the most likely mechanism of action of anthelmin…
EFMviz
2020
Elementary Flux Modes (EFMs) are a tool for constraint-based modeling and metabolic network analysis. However, systematic and automated visualization of EFMs, capable of integrating various data types is still a challenge. In this study, we developed an extension for the widely adopted COBRA Toolbox, EFMviz, for analysis and graphical visualization of EFMs as networks of reactions, metabolites and genes. The analysis workflow offers a platform for EFM visualization to improve EFM interpretability by connecting COBRA toolbox with the network analysis and visualization software Cytoscape. The biological applicability of EFMviz is demonstrated in two use cases on medium (Escherichia coli, iAF1…
Reactome pathway analysis: a high-performance in-memory approach
2016
Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples. Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data st…
Reactome graph database: Efficient access to complex pathway data
2018
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its qu…
A deeper look into natural sciences with physics-based and data-driven measures
2021
Summary With the development of machine learning in recent years, it is possible to glean much more information from an experimental data set to study matter. In this perspective, we discuss some state-of-the-art data-driven tools to analyze latent effects in data and explain their applicability in natural science, focusing on two recently introduced, physics-motivated computationally cheap tools—latent entropy and latent dimension. We exemplify their capabilities by applying them on several examples in the natural sciences and show that they reveal so far unobserved features such as, for example, a gradient in a magnetic measurement and a latent network of glymphatic channels from the mous…
Graph Theoretical Framework of Brain Networks in Multiple Sclerosis: A Review of Concepts.
2019
Abstract Network science provides powerful access to essential organizational principles of the human brain. It has been applied in combination with graph theory to characterize brain connectivity patterns. In multiple sclerosis (MS), analysis of the brain networks derived from either structural or functional imaging provides new insights into pathological processes within the gray and white matter. Beyond focal lesions and diffuse tissue damage, network connectivity patterns could be important for closely tracking and predicting the disease course. In this review, we describe concepts of graph theory, highlight novel issues of tissue reorganization in acute and chronic neuroinflammation an…
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms
2018
Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…
Lost Strings in Genomes: What Sense Do They Make?
2017
We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.
CLOVE: classification of genomic fusions into structural variation events
2017
Background A precise understanding of structural variants (SVs) in DNA is important in the study of cancer and population diversity. Many methods have been designed to identify SVs from DNA sequencing data. However, the problem remains challenging because existing approaches suffer from low sensitivity, precision, and positional accuracy. Furthermore, many existing tools only identify breakpoints, and so not collect related breakpoints and classify them as a particular type of SV. Due to the rapidly increasing usage of high throughput sequencing technologies in this area, there is an urgent need for algorithms that can accurately classify complex genomic rearrangements (involving more than …
The Metabolic Building Blocks of a Minimal Cell
2020
This article belongs to the Section Evolutionary Biology.