Search results for "Big data"

showing 10 items of 311 documents

Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

2019

Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …

Data AnalysisFOS: Computer and information sciencesTime FactorsTime FactorComputer scienceStatistics as TopicBig dataApache Spark; distributed computing; performance evaluation; k-mer countinglcsh:Computer applications to medicine. Medical informaticsBiochemistryDomain (software engineering)Databases03 medical and health sciences0302 clinical medicineStructural BiologyComputer clusterStatisticsSpark (mathematics)Molecular Biologylcsh:QH301-705.5030304 developmental biology0303 health sciencesGenomeSettore INF/01 - InformaticaBase SequenceNucleic AcidApache Sparkbusiness.industryResearchApache Spark; Distributed computing; k-mer counting; Performance evaluation; Algorithms; Base Sequence; Software; Time Factors; Data Analysis; Databases Nucleic Acid; Genome; Statistics as TopicApplied Mathematicsk-mer countingDistributed computingComputer Science ApplicationsAlgorithmData AnalysiComputer Science - Distributed Parallel and Cluster Computinglcsh:Biology (General)030220 oncology & carcinogenesisScalabilityPerformance evaluationlcsh:R858-859.7Algorithm designDistributed Parallel and Cluster Computing (cs.DC)Databases Nucleic AcidbusinessAlgorithmsSoftware
researchProduct

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

2014

Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about "what-is-happening-now" with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that…

Data streamFOS: Computer and information sciencesComputer Science - Computation and LanguageComputer sciencebusiness.industryData stream miningSentiment analysisBig dataMachine Learning (stat.ML)Databases (cs.DB)Data structurecomputer.software_genreField (computer science)Computer Science - Information RetrievalTree (data structure)Computer Science - DatabasesComputer Science - Distributed Parallel and Cluster ComputingAnalyticsStatistics - Machine LearningData miningDistributed Parallel and Cluster Computing (cs.DC)businesscomputerComputation and Language (cs.CL)Information Retrieval (cs.IR)
researchProduct

How Big Data Informs Us About the Population Health Status : endophthalmitis after ophthalmologic procedures

2022

The use of Big Data, in the form of almost exhaustive French medico-administrative databases, has made it possible to address several issues without which this would not have been possible. First, to define the incidence of endophthalmitis after ophthalmologic procedures without bias on specific recruitment modalities of respondents (tertiary centers, questionnaires ...). Thus, local observations of a change in prevalence trends of causative procedures were confirmed at the national level. The reliable description of the incidence of endophthalmitis will then make it possible to identify critical situations of recrudescence of cases. The knowledge of the delay of occurrence after the proced…

DatabaseFrench National Health InsuranceBig DataOphthalmology[SDV.MHEP] Life Sciences [q-bio]/Human health and pathologyOphtalmologieBase de donnéeSniiram
researchProduct

Advanced Topics in Intelligent Information and Database Systems

2017

This book presents recent research in intelligent information and database systems. The carefully selected contributions were initially accepted for presentation as posters at the 9th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2017) held from to 5 April 2017 in Kanazawa, Japan. While the contributions are of an advanced scientific level, several are accessible for non-expert readers. The book brings together 47 chapters divided into six main parts: • Part I. From Machine Learning to Data Mining.• Part II. Big Data and Collaborative Decision Support Systems,• Part III. Computer Vision Analysis, Detection, Tracking and Recognition,• Part IV. Data-Intensive Text P…

Decision support systemDatabaseComputer sciencebusiness.industryBig dataComputational intelligencecomputer.software_genreData scienceResource (project management)Text processingDecision managementCollaborationThe Internetbusinesscomputer
researchProduct

Big Data as a Driver for Clinical Decision Support Systems: A Learning Health Systems Perspective

2018

Big data technologies are nowadays providing health care with powerful instruments to gather and analyze large volumes of heterogeneous data collected for different purposes, including clinical care, administration, and research. This makes possible to design IT infrastructures that favor the implementation of the so-called “Learning Healthcare System Cycle,” where healthcare practice and research are part of a unique and synergic process. In this paper we highlight how “Big Data enabled” integrated data collections may support clinical decision-making together with biomedical research. Two effective implementations are reported, concerning decision support in Diabetes and in Inherited Arrh…

Decision support systemProcess (engineering)Computer scienceBig datacomputer.software_genre01 natural sciencesClinical decision support systemlcsh:QA75.5-76.9503 medical and health sciences0302 clinical medicinebig datalcsh:AZ20-999Health care030212 general & internal medicine0101 mathematicsdata analyticsdata integrationImplementationbusiness.industry010102 general mathematicslearning health care cyclelcsh:History of scholarship and learning. The humanitiesData scienceData warehousedata warehouseslcsh:Electronic computers. Computer sciencebusinesscomputerData integrationFrontiers in Digital Humanities
researchProduct

Information Requirements for Big Data Projects: A Review of State-of-the-Art Approaches

2018

Big data technologies are rapidly gaining popularity and become widely used, thus, making the choice of developing methodologies including the approaches for requirements analysis more acute. There is a position that in the context of the Data Warehousing (DW), similar to other Decision Support Systems (DSS) technologies, defining information requirements (IR) can increase the chances of the project to be successful with its goals achieved. This way, it is important to examine this subject in the context of Big data due to the lack of research in the field of Big data requirements analysis. This paper gives an overview of the existing methods associated with Big data technologies and requir…

Decision support systemRequirements engineeringComputer sciencebusiness.industryBig dataContext (language use)businessData sciencePopularityRequirements analysisField (computer science)Data warehouse
researchProduct

The Challenges for Regulation and Control in an Environment of Rapid Technological Innovations

2019

Currently, amplified use of the ITC-technologies and digitalization in almost all industries has changed the value and significance of the information. The use of these new technologies offer tremendous opportunities for innovation and development, but at the same time ask for regulation and control policies to ensure appropriate storage and use of information and avoid illicit utilization of data. Moreover, use of innovative technologies such as blockchain-based technology, artificial intelligence, cloud technology, and others has complicated and disrupted the landscape of the financial services providers and their ancillary service providers such as auditors, underwriters, advisors, actua…

DilemmaRisk analysis (engineering)business.industryEmerging technologiesBig dataCloud computingAuditbusinessFinancial servicesUtility modelUnderwriting
researchProduct

Accelerating data queries on Hadoop framework by using compact data formats

2016

There are massive amounts of data generated from IoT, online transactions, click streams, emails, logs, posts, social networking interactions, sensors, mobile phones and their applications etc. The question is where and how to store these data in order to provide faster data access. Understanding and handling Big Data is a big challenge. The research direction in Big Data projects using Hadoop Technology, MapReduce kind of framework and compact data formats such as RCFile, SequenceFile, ORC, Avro, Parquet shows that only two data formats (Avro and Parquet) support schema evolution and compression in order to utilize less storage space. In this paper, file formats like Avro and Parquet are c…

Distributed databaseDatabasePlain textComputer sciencebusiness.industryBig datacomputer.file_formatcomputer.software_genreFile formatColumn (database)Schema evolutionData accessBinary databusinesscomputer2016 IEEE 4th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE)
researchProduct

Topic 5: Parallel and Distributed Data Management

2013

Nowadays we are facing an exponential growth of new data that is overwhelming the capabilities of companies, institutions and the society in general to manage and use it in a proper way. Ever-increasing investments in Big Data, cutting edge technologies and the latest advances in both application development and underlying storage systems can help dealing with data of such magnitude. Especially parallel and distributed approaches will enable new data management solutions that operate effectively at large scale.

Distributed design patternsbusiness.industryDistributed algorithmComputer scienceScale (chemistry)Data managementBig dataEnhanced Data Rates for GSM EvolutionbusinessData science
researchProduct

Programming languages for data-Intensive HPC applications: A systematic mapping study

2020

This work is a result of activities from COST Action 10406 High -Performance Modelling and Simulation for Big Data Applications (cHiPSet), funded by the European Cooperation in Science and Technology. FCT, Portugal for grants: NOVA LINCS Research Laboratory Ref. UID/ CEC/ 04516/ 2019); INESC-ID Ref. UID/CEC/50021/2019; BioISI Ref. UID/MULTI/04046/2103; LASIGE Research Unit Ref. UID/CEC/00408/ 2019. A major challenge in modelling and simulation is the need to combine expertise in both software technologies and a given scientific domain. When High-Performance Computing (HPC) is required to solve a scientific problem, software development becomes a problematic issue. Considering the complexity…

Domain-Specific language (DSL)High performance computing (HPC)Computer scienceComputer Networks and CommunicationsBig data; Data-intensive applications; Domain-Specific language (DSL); General-Purpose language (GPL); High performance computing (HPC); Programming languages; Systematic mapping study (SMS)Systematic mapping study (SMS)Big dataData-intensive applicationsContext (language use)computer.software_genreTheoretical Computer ScienceSoftware portabilityBig dataSoftwareArtificial Intelligencebusiness.industryProgramming languageSoftware developmentGeneral-Purpose language (GPL)UsabilityProgramming languagesDigital libraryComputer Graphics and Computer-Aided DesignHardware and ArchitecturebusinesscomputerSoftware
researchProduct