Search results for "Data"
showing 10 items of 12992 documents
Predictive shelf life model based on RF technology for improving the management of food supply chain: A case study
2016
The aim of this paper was the development of a Smart Logistic Unit (SLU) based on RF technology to support the management of the food supply chain, in order to guarantee the shelf life of products in agreement with logistic efficiency and system sustainability. For this purpose, the main parameters that influence the quality of perishable products were determined and a shelf life equation based on Volatile Organic Compounds (VOCs) was modelled. The levels of VOCs were gathered by the sensors allocated inside the SLU, which configures as the remote element of a system for identification and data transmission. The proposed model was then validated through an experimental test, simulating the …
Integration of animal health and public health surveillance sources to exhaustively inform the risk of zoonosis: An application to echinococcosis in …
2020
The analysis of zoonotic disease risk requires the consideration of both human and animal geo-referenced disease incidence data. Here we show an application of joint Bayesian analyses to the study of echinococcosis granulosus (EG) in the province of Rio Negro, Argentina. We focus on merging passive and active surveillance data sources of animal and human EG cases using joint Bayesian spatial and spatio-temporal models. While similar spatial clustering and temporal trending was apparent, there appears to be limited lagged dependence between animal and human outcomes. Beyond the data quality issues relating to missingness at different times, we were able to identify relations between dog and …
Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM
2019
Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. We have tested STREAM on several synthetic and real datasets generated with different single-cell techno…
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms
2018
Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…
Diversification of spatiotemporal expression and copy number variation of the echinoid hbox12/pmar1/micro1 multigene family
2017
Changes occurring during evolution in the cis-regulatory landscapes of individual members of multigene families might impart diversification in their spatiotemporal expression and function. The archetypal member of the echinoid hbox12/pmar1/micro1 family is hbox12-a, a homeobox-containing gene expressed exclusively by dorsal blastomeres, where it governs the dorsal/ventral gene regulatory network during embryogenesis of the sea urchin Paracentrotus lividus. Here we describe the inventory of the hbox12/pmar1/micro1 genes in P. lividus, highlighting that gene copy number variation occurs across individual sea urchins of the same species. We show that the various hbox12/pmar1/micro1 genes grou…
iDamIDseq and iDEAR: an improved method and computational pipeline to profile chromatin-binding proteins
2016
DNA adenine methyltransferase identification (DamID) has emerged as an alternative method to profile protein-DNA interactions; however, critical issues limit its widespread applicability. Here, we present iDamIDseq, a protocol that improves specificity and sensitivity by inverting the steps DpnI-DpnII and adding steps that involve a phosphatase and exonuclease. To determine genome-wide protein-DNA interactions efficiently, we present the analysis tool iDEAR (iDamIDseq Enrichment Analysis with R). The combination of DamID and iDEAR permits the establishment of consistent profiles for transcription factors, even in transient assays, as we exemplify using the small teleost medaka (Oryzias lati…
FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
2017
Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…
Detecting mutations by eBWT
2018
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…
The colored longest common prefix array computed via sequential scans
2018
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…
Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus
2016
Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal …