Search results for "Database"
showing 10 items of 2136 documents
Metadata to Support Data Warehouse Evolution
2009
The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.
Spatio-temporal Schema Integration with Validation: A Practical Approach
2005
We propose to enhance a schema integration process with a validation phase employing logic-based data models. In our methodology, we validate the source schemas against the data model; the inter-schema mappings are validated against the semantics of the data model and the syntax of the correspondence language. In this paper, we focus on how to employ a reasoning engine to validate spatio-temporal schemas and describe where the reasoning engine is plugged into our integration methodology. The validation phase distinguishes our integration methodology from other approaches. We shift the emphasis on automation from the a priori discovery to the a posteriori checking of the inter-schema mapping…
A new fast and fault-tolerant identification algorithm for spectral databases
1995
A new method for an automatic, computer and database driven identification of UV/VIS spectra is described. It is shown that an identification algorithm must consider the spectral differences as well as their common features. The described identification method allows identifications, even if the spectra are distorted or shifted.
Population geocoding for healthcare management. Technical challenges and quality issues
2015
The present work aims at describing the main issues related with population geocoding for healthcare management. Some of the available procedures for geocoding multiple addresses are described and an indicator of quality of the geocoded addresses is proposed. As a case study, the geocoding of population addresses of a set of 9 Sicilian Municipalities is described and results deriving from the use of two different methods are compared in terms of quality. Some potential applications of population geocoding in healthcare management are finally discussed.
Streamlining distributed Deep Learning I/O with ad hoc file systems
2021
With evolving techniques to parallelize Deep Learning (DL) and the growing amount of training data and model complexity, High-Performance Computing (HPC) has become increasingly important for machine learning engineers. Although many compute clusters already use learning accelerators or GPUs, HPC storage systems are not suitable for the I/O requirements of DL workflows. Therefore, users typically copy the whole training data to the worker nodes or distribute partitions. Because DL depends on randomized input data, prior work stated that partitioning impacts DL accuracy. Their solutions focused mainly on training I/O performance on a high-speed network but did not cover the data stage-in pro…
VegItaly: Technical features, crucial issues and some solutions
2012
VegItaly is at present the largest Italian vegetation database. It is the result of a collaborative project aspiring to represent a major reference for the Italian vegetation scientists. The paper emphasizes its benefits for phytosociological data management and describes the solutions adopted to solve several technical problems, like the treatment of different vegetation stratification systems, the conversion of vegetation cover values, taxonomic and syntaxonomic issues, data import and access. The structure of the taxonomic list produced to support the storing of data is described. It allows an easy management of synonymic relationships and is constantly updated according to new publicati…
Quantitative approaches for evaluating the influence of films using the IMDb database
2016
[EN] Why do films certain remain influential throughout film history? The purpose of this paper is to attempt to answer this question. To do so, we adopt some quantitative approaches that facilitate an objective interpretation of the data. The data source we have chosen for this study is the Internet Online Movie Database (IMDb), and in particular, one of its sections called "Connections", which lists references made to a film in subsequent movies and references made in the film itself to previous ones. The extraction and analysis of these networks of citations allows us to draw some conclusions about the most influential movies in film history, identifying their distinguishing features, an…
Nationwide evaluation of day-to-day clinical pharmacists' interventions in German hospitals.
2015
tudy Objective To describe and evaluate the extent and diversity of nationwide data from clinical pharmacists’ interventions (PIs) in German hospitals. Design Retrospective analysis. Data Source The ADKA-DokuPIK German database, a national anonymous self-reported Internet-based documentation system for routine PIs as well as for medication errors reported by German hospital pharmacists. Measurements and Main Results Data sets from ADKA-DokuPIK entered between January 2009 and December 2012 were analyzed descriptively. A total of 27,610 PIs were entered, mainly by ward-based clinical pharmacists (82.5%). Most of the PIs were performed on surgical wards (37.8%), followed by anesthesiology/int…
Distributed Real-Time Sentiment Analysis for Big Data Social Streams
2014
Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about "what-is-happening-now" with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that…
Additional file 1 of Ethnobotany of dye plants in Southern Italy, Mediterranean Basin: floristic catalog and two centuries of analysis of traditional…
2020
Additional file 1: Supplementary File 1. A wider and more complete database and the currently available data.