Search results for "Big data"
showing 10 items of 311 documents
Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics
2019
Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …
Distributed Real-Time Sentiment Analysis for Big Data Social Streams
2014
Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about "what-is-happening-now" with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that…
How Big Data Informs Us About the Population Health Status : endophthalmitis after ophthalmologic procedures
2022
The use of Big Data, in the form of almost exhaustive French medico-administrative databases, has made it possible to address several issues without which this would not have been possible. First, to define the incidence of endophthalmitis after ophthalmologic procedures without bias on specific recruitment modalities of respondents (tertiary centers, questionnaires ...). Thus, local observations of a change in prevalence trends of causative procedures were confirmed at the national level. The reliable description of the incidence of endophthalmitis will then make it possible to identify critical situations of recrudescence of cases. The knowledge of the delay of occurrence after the proced…
Advanced Topics in Intelligent Information and Database Systems
2017
This book presents recent research in intelligent information and database systems. The carefully selected contributions were initially accepted for presentation as posters at the 9th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2017) held from to 5 April 2017 in Kanazawa, Japan. While the contributions are of an advanced scientific level, several are accessible for non-expert readers. The book brings together 47 chapters divided into six main parts: • Part I. From Machine Learning to Data Mining.• Part II. Big Data and Collaborative Decision Support Systems,• Part III. Computer Vision Analysis, Detection, Tracking and Recognition,• Part IV. Data-Intensive Text P…
Big Data as a Driver for Clinical Decision Support Systems: A Learning Health Systems Perspective
2018
Big data technologies are nowadays providing health care with powerful instruments to gather and analyze large volumes of heterogeneous data collected for different purposes, including clinical care, administration, and research. This makes possible to design IT infrastructures that favor the implementation of the so-called “Learning Healthcare System Cycle,” where healthcare practice and research are part of a unique and synergic process. In this paper we highlight how “Big Data enabled” integrated data collections may support clinical decision-making together with biomedical research. Two effective implementations are reported, concerning decision support in Diabetes and in Inherited Arrh…
Information Requirements for Big Data Projects: A Review of State-of-the-Art Approaches
2018
Big data technologies are rapidly gaining popularity and become widely used, thus, making the choice of developing methodologies including the approaches for requirements analysis more acute. There is a position that in the context of the Data Warehousing (DW), similar to other Decision Support Systems (DSS) technologies, defining information requirements (IR) can increase the chances of the project to be successful with its goals achieved. This way, it is important to examine this subject in the context of Big data due to the lack of research in the field of Big data requirements analysis. This paper gives an overview of the existing methods associated with Big data technologies and requir…
The Challenges for Regulation and Control in an Environment of Rapid Technological Innovations
2019
Currently, amplified use of the ITC-technologies and digitalization in almost all industries has changed the value and significance of the information. The use of these new technologies offer tremendous opportunities for innovation and development, but at the same time ask for regulation and control policies to ensure appropriate storage and use of information and avoid illicit utilization of data. Moreover, use of innovative technologies such as blockchain-based technology, artificial intelligence, cloud technology, and others has complicated and disrupted the landscape of the financial services providers and their ancillary service providers such as auditors, underwriters, advisors, actua…
Accelerating data queries on Hadoop framework by using compact data formats
2016
There are massive amounts of data generated from IoT, online transactions, click streams, emails, logs, posts, social networking interactions, sensors, mobile phones and their applications etc. The question is where and how to store these data in order to provide faster data access. Understanding and handling Big Data is a big challenge. The research direction in Big Data projects using Hadoop Technology, MapReduce kind of framework and compact data formats such as RCFile, SequenceFile, ORC, Avro, Parquet shows that only two data formats (Avro and Parquet) support schema evolution and compression in order to utilize less storage space. In this paper, file formats like Avro and Parquet are c…
Topic 5: Parallel and Distributed Data Management
2013
Nowadays we are facing an exponential growth of new data that is overwhelming the capabilities of companies, institutions and the society in general to manage and use it in a proper way. Ever-increasing investments in Big Data, cutting edge technologies and the latest advances in both application development and underlying storage systems can help dealing with data of such magnitude. Especially parallel and distributed approaches will enable new data management solutions that operate effectively at large scale.
Programming languages for data-Intensive HPC applications: A systematic mapping study
2020
This work is a result of activities from COST Action 10406 High -Performance Modelling and Simulation for Big Data Applications (cHiPSet), funded by the European Cooperation in Science and Technology. FCT, Portugal for grants: NOVA LINCS Research Laboratory Ref. UID/ CEC/ 04516/ 2019); INESC-ID Ref. UID/CEC/50021/2019; BioISI Ref. UID/MULTI/04046/2103; LASIGE Research Unit Ref. UID/CEC/00408/ 2019. A major challenge in modelling and simulation is the need to combine expertise in both software technologies and a given scientific domain. When High-Performance Computing (HPC) is required to solve a scientific problem, software development becomes a problematic issue. Considering the complexity…