Search results for "database."

showing 10 items of 2119 documents

Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

2019

Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …

Data AnalysisFOS: Computer and information sciencesTime FactorsTime FactorComputer scienceStatistics as TopicBig dataApache Spark; distributed computing; performance evaluation; k-mer countinglcsh:Computer applications to medicine. Medical informaticsBiochemistryDomain (software engineering)Databases03 medical and health sciences0302 clinical medicineStructural BiologyComputer clusterStatisticsSpark (mathematics)Molecular Biologylcsh:QH301-705.5030304 developmental biology0303 health sciencesGenomeSettore INF/01 - InformaticaBase SequenceNucleic AcidApache Sparkbusiness.industryResearchApache Spark; Distributed computing; k-mer counting; Performance evaluation; Algorithms; Base Sequence; Software; Time Factors; Data Analysis; Databases Nucleic Acid; Genome; Statistics as TopicApplied Mathematicsk-mer countingDistributed computingComputer Science ApplicationsAlgorithmData AnalysiComputer Science - Distributed Parallel and Cluster Computinglcsh:Biology (General)030220 oncology & carcinogenesisScalabilityPerformance evaluationlcsh:R858-859.7Algorithm designDistributed Parallel and Cluster Computing (cs.DC)Databases Nucleic AcidbusinessAlgorithmsSoftware

researchProduct

Preventive strategies and factors associated with surgically treated necrotising enterocolitis in extremely preterm infants: an international unit su…

2019

ObjectivesTo compare necrotising enterocolitis (NEC) prevention practices and NEC associated factors between units from eight countries of the International Network for Evaluation of Outcomes of Neonates, and to assess their association with surgical NEC rates.DesignProspective unit-level survey combined with retrospective cohort study.SettingNeonatal intensive care units in Australia/New Zealand, Canada, Finland, Israel, Spain, Sweden, Switzerland and Tuscany (Italy).PatientsExtremely preterm infants born between 240to 286weeks’ gestation, with birth weights<1500 g, and admitted between 2014–2015.ExposuresNEC prevention practices (probiotics, feeding, donor milk) using responses of an o…

Data AnalysisMalePediatricsInternationalityDatabases FactualInfant Premature Diseases2700 General Medicinepaediatric gastroenterologyCohort Studiesperinatology0302 clinical medicineRisk FactorsCause of DeathSurveys and QuestionnairesHospital Mortality1506030212 general & internal medicineOriginal ResearchIncidence (epidemiology)General MedicinePrognosis3. Good healthPrimary PreventionTreatment OutcomeInfant Extremely PrematureCohortGestationFemale1719Cohort studymedicine.medical_specialty610 Medicine & healthneonatologySepsis03 medical and health sciencesEnterocolitis NecrotizingIntensive Care Units Neonatal030225 pediatricsIntensive caremedicineHumansNeonatologyRetrospective Studiesbusiness.industryProbioticsInfant NewbornPaediatricsRetrospective cohort studymedicine.disease10027 Clinic for NeonatologySurvival Analysisdigestive system diseasesbusiness

researchProduct

Modeling crowd dynamics through coarse-grained data analysis

2018

International audience; Understanding and predicting the collective behaviour of crowds is essential to improve the efficiency of pedestrian flows in urban areas and minimize the risks of accidents at mass events. We advocate for the development of crowd traffic management systems, whereby observations of crowds can be coupled to fast and reliable models to produce rapid predictions of the crowd movement and eventually help crowd managers choose between tailored optimization strategies. Here, we propose a Bi-directional Macroscopic (BM) model as the core of such a system. Its key input is the fundamental diagram for bi-directional flows, i.e. the relation between the pedestrian fluxes and d…

Data AnalysisOperations researchComputer scienceFLOW[INFO.INFO-GR] Computer Science [cs]/Graphics [cs.GR]macroscopic model0904 Chemical EngineeringTransportation02 engineering and technologycomputer.software_genre01 natural sciences010305 fluids & plasmas[SHS]Humanities and Social Sciences[SCCO]Cognitive scienceCrowds0903 Biomedical Engineering0102 Applied Mathematics11. Sustainability0202 electrical engineering electronic engineering information engineeringCluster AnalysisApplied Mathematicsbi-directional fluxcollective behaviourGeneral Medicine[INFO.INFO-GR]Computer Science [cs]/Graphics [cs.GR]Computational MathematicsCore (game theory)Modeling and Simulation[SCCO.PSYC]Cognitive science/Psychology020201 artificial intelligence & image processingGeneral Agricultural and Biological SciencesLife Sciences & BiomedicineBEHAVIORCrowd dynamicsRelation (database)Bioinformatics[MATH.MATH-DS]Mathematics [math]/Dynamical Systems [math.DS]BioengineeringPedestrianModels PsychologicalMachine learningAdvanced Traffic Management SystemPedestrian traffic0103 physical sciencesHumansComputer Simulation[NLIN.NLIN-AO]Nonlinear Sciences [physics]/Adaptation and Self-Organizing Systems [nlin.AO]Block (data storage)Science & Technologybusiness.industryMathematical ConceptsSIMULATIONSdata-based modelingCrowdingKey (cryptography)Artificial intelligenceMathematical & Computational Biologybusinesscomputer

researchProduct

IMI – Oral biopharmaceutics tools project – Evaluation of bottom-up PBPK prediction success part 4: Prediction accuracy and software comparisons with…

2020

Oral drug absorption is a complex process depending on many factors, including the physicochemical properties of the drug, formulation characteristics and their interplay with gastrointestinal physiology and biology. Physiological-based pharmacokinetic (PBPK) models integrate all available information on gastro-intestinal system with drug and formulation data to predict oral drug absorption. The latter together with in vitro-in vivo extrapolation and other preclinical data on drug disposition can be used to predict plasma concentration-time profiles in silico. Despite recent successes of PBPK in many areas of drug development, an improvement in their utility for evaluating oral absorption i…

Data AnalysisPhysiologically based pharmacokinetic modellingDatabases FactualAdministration OralPharmaceutical Science02 engineering and technologyMachine learningcomputer.software_genreModels Biological030226 pharmacology & pharmacyBiopharmaceuticsPharmaceutical Sciences03 medical and health sciences0302 clinical medicineSoftwarePharmacokineticsHumansClinical Trials as Topicbusiness.industryCompound specificBiopharmaceuticsGeneral MedicineFarmaceutiska vetenskaper021001 nanoscience & nanotechnologyBioavailabilityIntestinal AbsorptionPharmaceutical PreparationsDrug developmentPerformance indicatorArtificial intelligence0210 nano-technologybusinesscomputerSoftwareForecastingBiotechnologyEuropean Journal of Pharmaceutics and Biopharmaceutics

researchProduct

Probabilistic techniques for bridging the semantic gap in schema alignment

Connecting pieces of informations from heterogeneous sources sharing the same domain is an open challenge in Semantic Web, Big Data and business communities. The main problem in this research area is to bridge the expressiveness gap between relational databases and ontologies. In general, an ontology is more expressive and captures more semantic information behind data than a relational database does. On the other side, databases are the most common used persistent storage system and they grant beneﬁts such as security and data integrity but they need to be managed by expert users. The problem is quite signiﬁcant above all when enterprise or corporate ontologies are used to share infomation…

Data IntegrationOWL OntologyDatabaseSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSchema MatchingEntity-Relation DiagramHidden Markov Model

researchProduct

Network reconstruction for trans acting genetic loci using multi-omics data and prior information.

2022

Background: Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors…

Data Integrationeducation.field_of_studyComputer scienceScale (chemistry)Bayesian probabilityPopulationQuantitative Trait LociBiological databaseInferenceData Integration ; Machine Learning ; Multi-omics ; Network Inference ; Personalized Medicine ; Prior Information ; Simulation ; Systems BiologyComputational biologyQuantitative trait locusReplication (computing)Machine LearningPrior probabilityCohortGeneticsMolecular MedicineHumans:Medicine [Science]Gene Regulatory NetworkseducationTranscriptomeMolecular BiologyGenetics (clinical)Genome medicine

researchProduct

BioTIME: A database of biodiversity time series for the Anthropocene

2018

Abstract Motivation The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. These data enable users to calculate temporal trends in biodiversity within and amongst assemblages using a broad range of metrics. BioTIME is being developed as a community-led open-source database of biodiversity time series. Our goal is to accelerate and facilitate quantitative analysis of temporal patterns of biodiversity in the Anthropocene. Main types of variables included The database contains 8,777,413 species abundance records, from assemblages consistently sampled for a minimum of 2 years, which need not necessarily be consecutive. In addition, th…

Data Papers0106 biological sciencesRange (biology)QH301 BiologytemporalNERCBiodiversity:Matematikk og Naturvitenskap: 400::Zoologiske og botaniske fag: 480 [VDP]BIALOWIEZA NATIONAL-PARKspecialcomputer.software_genre[SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics Phylogenetics and taxonomy01 natural sciencesspecies richnessSDG 15 - Life on LandbiodiversityGlobal and Planetary ChangeB003-ecologyDatabaseEcologySampling (statistics)SIMULATED HERBIVORYsupporting technologiesLAND-BRIDGE ISLANDS[SDV.BV.BOT]Life Sciences [q-bio]/Vegetal Biology/BotanicsPE&RCglobal/dk/atira/pure/thematic/inbo_th_00032PRIMEVAL TEMPERATE FORESTGeographyPOPULATION TRENDS/dk/atira/pure/discipline/B000/B003biodiversity; global; special; species richness; temporal; turnoverData PaperSECONDARY FORESTEvolutionESTUARINE COASTAL LAGOON010603 evolutionary biology/dk/atira/pure/sustainabledevelopmentgoals/life_below_waterQH301[SDV.EE.ECO]Life Sciences [q-bio]/Ecology environment/EcosystemsBehavior and SystematicsAnthropocenebiodiversity; global; spatial; species richness; temporal; turnover; Global and Planetary Change; Ecology Evolution Behavior and Systematics; EcologyVDP::Mathematics and natural science: 400::Zoology and botany: 480species richne14. Life underwaterSDG 14 - Life Below WaterNE/L002531/1ZA4450Relative species abundanceEcology Evolution Behavior and SystematicsZA4450 Databases010604 marine biology & hydrobiologyturnoverRCUKBiology and Life SciencesDAS/dk/atira/pure/technological/ondersteunende_technieken15. Life on landDECIDUOUS FORESTspatialTaxonFish13. Climate actionMCPWildlife Ecology and ConservationLONG-TERM CHANGESpecies richness[SDE.BE]Environmental Sciences/Biodiversity and EcologycomputerGlobal and Planetary ChangeBIRD COMMUNITY DYNAMICSVDP::Matematikk og Naturvitenskap: 400::Zoologiske og botaniske fag: 480

researchProduct

Controlling false match rates in record linkage using extreme value theory

2011

AbstractCleansing data from synonyms and homonyms is a relevant task in fields where high quality of data is crucial, for example in disease registries and medical research networks. Record linkage provides methods for minimizing synonym and homonym errors thereby improving data quality. We focus our attention to the case of homonym errors (in the following denoted as ‘false matches’), in which records belonging to different entities are wrongly classified as equal. Synonym errors (‘false non-matches’) occur when a single entity maps to multiple records in the linkage result. They are not considered in this study because in our application domain they are not as crucial as false matches. Fa…

Data cleansingData cleansingBiomedical ResearchDatabases FactualCalibration (statistics)Computer scienceHealth Informaticscomputer.software_genrePlot (graphics)Mean excess plotStatisticsRegistriesExtreme value theoryLinkage (software)Models StatisticalComputational BiologyFellegi–Sunter modelMixture modelGeneralized Pareto distributionComputer Science ApplicationsData qualityStatistics of extreme valuesDatabase Management SystemsMedical Record LinkageData miningcomputerAlgorithmsMedical InformaticsRecord linkageJournal of Biomedical Informatics

researchProduct

Metadata to Support Data Warehouse Evolution

2009

The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.

Data elementInformation retrievalDatabaseComputer scienceInformationSystems_DATABASEMANAGEMENTcomputer.software_genreData warehouseMetadata repositorySchema evolutionMetadataRelational database management systemData extractionSchema (psychology)computer

researchProduct

Spatio-temporal Schema Integration with Validation: A Practical Approach

2005

We propose to enhance a schema integration process with a validation phase employing logic-based data models. In our methodology, we validate the source schemas against the data model; the inter-schema mappings are validated against the semantics of the data model and the syntax of the correspondence language. In this paper, we focus on how to employ a reasoning engine to validate spatio-temporal schemas and describe where the reasoning engine is plugged into our integration methodology. The validation phase distinguishes our integration methodology from other approaches. We shift the emphasis on automation from the a priori discovery to the a posteriori checking of the inter-schema mapping…

Data modelDescription logicComputer scienceData integritySchema (psychology)InformationSystems_DATABASEMANAGEMENTSemantic reasonerData miningLogic modelcomputer.software_genrecomputerComputer Science::DatabasesData modeling

researchProduct