Search results for " Informatica"
showing 10 items of 978 documents
Deep learning network for exploiting positional information in nucleosome related sequences
2017
A nucleosome is a DNA-histone complex, wrapping about 150 pairs of double-stranded DNA. The role of nucleosomes is to pack the DNA into the nucleus of the Eukaryote cells to form the Chromatin. Nucleosome positioning genome wide play an important role in the regulation of cell type-specific gene activities. Several biological studies have shown sequence specificity of nucleosome presence, clearly underlined by the organization of precise nucleotides substrings. Taking into consideration such advances, the identification of nucleosomes on a genomic scale has been successfully performed by DNA sequence features representation and classical supervised classification methods such as Support Vec…
A REST-based framework to support non-invasive and early coeliac disease diagnosis
2019
The health sector has traditionally been one of the early adopters of databases, from the most simple Electronic Health Record (formerly Computer-Based Patient Record) systems in use in general practice, hospitals and intensive care units to big data, multidata based systems used to support diagnosis and care decisions. In this paper we present a framework to support non-invasive and early diagnosis of coeliac disease. The proposed framework makes use of well-known technologies and techniques, both hardware and software, put together in a novel way. The main goals of our framework are: (1) providing users with a reliable and fast repository of a large amount of data; (2) to make such reposi…
Strategies for structuring interdisciplinary education in Systems Biology: an European perspective
2016
Systems Biology is an approach to biology and medicine that has the potential to lead to a better understanding of how biological properties emerge from the interaction of genes, proteins, molecules, cells and organisms. The approach aims at elucidating how these interactions govern biological function by employing experimental data, mathematical models and computational simulations. As Systems Biology is inherently multidisciplinary, education within this field meets numerous hurdles including departmental barriers, availability of all required expertise locally, appropriate teaching material and example curricula. As university education at the Bachelor’s level is traditionally built upon…
FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
2017
Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…
Detecting mutations by eBWT
2018
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…
Alignment-free sequence comparison using absent words
2018
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…
Analysis of low-correlated spatial gene expression patterns: A clustering approach in the mouse brain data hosted in the Allen Brain Atlas
2018
The Allen Brain Atlas (ABA) provides a similar gene expression dataset by genome-scale mapping of the C57BL/6J mouse brain. In this study, the authors describe a method to extract the spatial information of gene expression patterns across a set of 1047 genes. The genes were chosen from among the 4104 genes having the lowest Pearson correlation coefficient used to compare the expression patterns across voxels in a single hemisphere for available coronal and sagittal volumes. The set of genes analysed in this study is the one discarded in the article by Bohland et al. , which was considered to be of a lower consistency, not a reliable dataset. Following a normalisation task with a global and …
Alignment Free Dissimilarities for Nucleosome Classification
2016
Epigenetic mechanisms such as nucleosome positioning, histone modifications and DNA methylation play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have shown a role of DNA sequences in recruitment of epigenetic regulators. For this reason, the use of more suitable similarities or dissimilarity between DNA sequences could help in the context of epigenetic studies. In particular, alignment-free dissimilarities have already been successfully applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles…
On the structural connectivity of large-scale models of brain networks at cellular level
2021
AbstractThe brain’s structural connectivity plays a fundamental role in determining how neuron networks generate, process, and transfer information within and between brain regions. The underlying mechanisms are extremely difficult to study experimentally and, in many cases, large-scale model networks are of great help. However, the implementation of these models relies on experimental findings that are often sparse and limited. Their predicting power ultimately depends on how closely a model’s connectivity represents the real system. Here we argue that the data-driven probabilistic rules, widely used to build neuronal network models, may not be appropriate to represent the dynamics of the …
Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences
2018
Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue effects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative k − mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classifi…