Search results for "database."
showing 10 items of 2119 documents
Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics
2019
Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …
Preventive strategies and factors associated with surgically treated necrotising enterocolitis in extremely preterm infants: an international unit su…
2019
ObjectivesTo compare necrotising enterocolitis (NEC) prevention practices and NEC associated factors between units from eight countries of the International Network for Evaluation of Outcomes of Neonates, and to assess their association with surgical NEC rates.DesignProspective unit-level survey combined with retrospective cohort study.SettingNeonatal intensive care units in Australia/New Zealand, Canada, Finland, Israel, Spain, Sweden, Switzerland and Tuscany (Italy).PatientsExtremely preterm infants born between 240to 286weeks’ gestation, with birth weights<1500 g, and admitted between 2014–2015.ExposuresNEC prevention practices (probiotics, feeding, donor milk) using responses of an o…
Modeling crowd dynamics through coarse-grained data analysis
2018
International audience; Understanding and predicting the collective behaviour of crowds is essential to improve the efficiency of pedestrian flows in urban areas and minimize the risks of accidents at mass events. We advocate for the development of crowd traffic management systems, whereby observations of crowds can be coupled to fast and reliable models to produce rapid predictions of the crowd movement and eventually help crowd managers choose between tailored optimization strategies. Here, we propose a Bi-directional Macroscopic (BM) model as the core of such a system. Its key input is the fundamental diagram for bi-directional flows, i.e. the relation between the pedestrian fluxes and d…
IMI – Oral biopharmaceutics tools project – Evaluation of bottom-up PBPK prediction success part 4: Prediction accuracy and software comparisons with…
2020
Oral drug absorption is a complex process depending on many factors, including the physicochemical properties of the drug, formulation characteristics and their interplay with gastrointestinal physiology and biology. Physiological-based pharmacokinetic (PBPK) models integrate all available information on gastro-intestinal system with drug and formulation data to predict oral drug absorption. The latter together with in vitro-in vivo extrapolation and other preclinical data on drug disposition can be used to predict plasma concentration-time profiles in silico. Despite recent successes of PBPK in many areas of drug development, an improvement in their utility for evaluating oral absorption i…
Probabilistic techniques for bridging the semantic gap in schema alignment
Connecting pieces of informations from heterogeneous sources sharing the same domain is an open challenge in Semantic Web, Big Data and business communities. The main problem in this research area is to bridge the expressiveness gap between relational databases and ontologies. In general, an ontology is more expressive and captures more semantic information behind data than a relational database does. On the other side, databases are the most common used persistent storage system and they grant benefits such as security and data integrity but they need to be managed by expert users. The problem is quite significant above all when enterprise or corporate ontologies are used to share infomation…
Network reconstruction for trans acting genetic loci using multi-omics data and prior information.
2022
Background: Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors…
BioTIME: A database of biodiversity time series for the Anthropocene
2018
Abstract Motivation The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. These data enable users to calculate temporal trends in biodiversity within and amongst assemblages using a broad range of metrics. BioTIME is being developed as a community-led open-source database of biodiversity time series. Our goal is to accelerate and facilitate quantitative analysis of temporal patterns of biodiversity in the Anthropocene. Main types of variables included The database contains 8,777,413 species abundance records, from assemblages consistently sampled for a minimum of 2 years, which need not necessarily be consecutive. In addition, th…
Controlling false match rates in record linkage using extreme value theory
2011
AbstractCleansing data from synonyms and homonyms is a relevant task in fields where high quality of data is crucial, for example in disease registries and medical research networks. Record linkage provides methods for minimizing synonym and homonym errors thereby improving data quality. We focus our attention to the case of homonym errors (in the following denoted as ‘false matches’), in which records belonging to different entities are wrongly classified as equal. Synonym errors (‘false non-matches’) occur when a single entity maps to multiple records in the linkage result. They are not considered in this study because in our application domain they are not as crucial as false matches. Fa…
Metadata to Support Data Warehouse Evolution
2009
The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.
Spatio-temporal Schema Integration with Validation: A Practical Approach
2005
We propose to enhance a schema integration process with a validation phase employing logic-based data models. In our methodology, we validate the source schemas against the data model; the inter-schema mappings are validated against the semantics of the data model and the syntax of the correspondence language. In this paper, we focus on how to employ a reasoning engine to validate spatio-temporal schemas and describe where the reasoning engine is plugged into our integration methodology. The validation phase distinguishes our integration methodology from other approaches. We shift the emphasis on automation from the a priori discovery to the a posteriori checking of the inter-schema mapping…