Search results for "big data."
showing 10 items of 310 documents
Next-generation sequencing: big data meets high performance computing
2017
The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and t…
Principal components analysis: theory and application to gene expression data analysis
2018
Advances in computational power have enabled research to generate significant amounts of data related to complex biological problems. Consequently, applying appropriate data analysis techniques has become paramount to tackle this complexity. However, theoretical understanding of statistical methods is necessary to ensure that the correct method is used and that sound inferences are made based on the analysis. In this article, we elaborate on the theory behind principal components analysis (PCA), which has become a favoured multivariate statistical tool in the field of omics-data analysis. We discuss the necessary prerequisites and steps to produce statistically valid results and provide gui…
A REST-based framework to support non-invasive and early coeliac disease diagnosis
2019
The health sector has traditionally been one of the early adopters of databases, from the most simple Electronic Health Record (formerly Computer-Based Patient Record) systems in use in general practice, hospitals and intensive care units to big data, multidata based systems used to support diagnosis and care decisions. In this paper we present a framework to support non-invasive and early diagnosis of coeliac disease. The proposed framework makes use of well-known technologies and techniques, both hardware and software, put together in a novel way. The main goals of our framework are: (1) providing users with a reliable and fast repository of a large amount of data; (2) to make such reposi…
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms
2018
Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…
Exploiting Helminth–Host Interactomes through Big Data
2017
Helminths facilitate their parasitic existence through the production and secretion of different molecules, including proteins. Some helminth proteins can manipulate the host's immune system, a phenomenon that is now being exploited with a view to developing therapeutics for inflammatory diseases. In recent years, hundreds of helminth genomes have been sequenced, but as a community we are still taking baby steps when it comes to identifying proteins that govern host-helminth interactions. The information generated from genomic, immunomic, and proteomic studies, as well as from cutting-edge approaches such as proteogenomics, is leading to a substantial volume of big data that can be utilised…
Coupling News Sentiment with Web Browsing Data Improves Prediction of Intra-Day Price Dynamics
2015
The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users' behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and dai…
Opportunities and challenges for drug development: public-private partnerships, adaptive designs and big data
2016
Drug development faces the double challenge of increasing costs and increasing pressure on pricing. To avoid that lack of perceived commercial perspective will leave existing medical needs unmet, pharmaceutical companies and many other stakeholders are discussing ways to improve the efficiency of drug Research and Development. Based on an international symposium organized by the Medical School of the University of Duisburg-Essen (Germany) and held in January 2016, we discuss the opportunities and challenges of three specific areas, i.e., public-private partnerships, adaptive designs and big data. Public-private partnerships come in many different forms with regard to scope, duration and typ…
An Integrative Framework for the Construction of Big Functional Networks
2018
We present a methodology for biological data integration, aiming at building and analysing large functional networks which model complex genotype-phenotype associations. A functional network is a graph where nodes represent cellular components (e.g., genes, proteins, mRNA, etc.) and edges represent associations among such molecules. Different types of components may cohesist in the same network, and associations may be related to physical[biochemical interactions or functional/phenotipic relationships. Due to both the large amount of involved information and the computational complexity typical of the problems in this domain, the proposed framework is based on big data technologies (Spark a…
Harnessing Big Data for Communicable Tropical and Sub-Tropical Disorders: Implications From a Systematic Review of the Literature
2018
aim: According to the World Health Organization (WHO), communicable tropical and sub-tropical diseases occur solely, or mainly in the tropics, thriving in hot, and humid conditions. Some of these disorders termed as neglected tropical diseases are par- ticularly overlooked. Communicable tropical/sub-tropical diseases represent a diverse group of communicable disorders occurring in 149 countries, favored by tropical and sub-tropical conditions, affecting more than one billion people and imposing a dramatic societal and economic burden. methods: A systematic review of the extant scholarly literature was carried out, searching in PubMed/MEDLINE and Scopus. The search string used included prope…