Search results for " Distributed Computing"
showing 10 items of 87 documents
On the Use of Binary Trees for DNA Hydroxymethylation Analysis
2017
DNA methylation (mC) and hydroxymethylation (hmC) can have a significant effect on normal human development, health and disease status. Hydroxymethylation studies require specific treatment of DNA, as well as software tools for their analysis. In this paper, we propose a parallel software tool for analyzing the DNA hydroxymethylation data obtained by TAB-seq. The software is based on the use of binary trees for searching the different occurrences of methylation and hydroxymethylation in DNA samples. The binary trees allow to efficiently store and access the information about the methylation of each methylated/hydroxymethylated cytosines in the samples. Evaluation results shows that the perf…
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms
2018
Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…
2016
The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 alig…
An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop
2016
Alignment-free methods are one of the mainstays of biological sequence comparison, i.e., the assessment of how similar two biological sequences are to each other, a fundamental and routine task in computational biology and bioinformatics. They have gained popularity since, even on standard desktop machines, they are faster than methods based on alignments. However, with the advent of Next-Generation Sequencing Technologies, datasets whose size, i.e., number of sequences and their total length, is a challenge to the execution of alignment-free methods on those standard machines are quite common. Here, we propose the first paradigm for the computation of k-mer-based alignment-free methods for…
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
2016
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data par…
Near field improvements of stochastic collaborative beamforming in wireless sensor networks
2020
Wireless sensor networks (WSN) are groups of small devices that contain a microcontroller in which a large number of sensors can be added. They transmit data and communicate to each other in the ISM band, standard IEEE 802.15.4, exchanging packets using a multi-hop routing. These devices are named motes and are nodes of the WSN. They are very simple and easy to program, powered by batteries of 1.5Volts (AA and AAA). The nodes are autonomous elements that can be deployed implementing any type of network. In a typical deployment the nodes communicate with each other and with a master node or Base Station (BS), which in turn transmits the information to an external server, which collects the e…
Big Data Processing in the ATLAS Experiment: Use Cases and Experience
2015
Abstract The physics goals of the next Large Hadron Collider run include high precision tests of the Standard Model and searches for new physics. These goals require detailed comparison of data with computational models simulating the expected data behavior. To highlight the role which modeling and simulation plays in future scientific discovery, we report on use cases and experience with a unified system built to process both real and simulated data of growing volume and variety.
Mapreduce in computational biology - A synopsis
2017
In the past 20 years, the Life Sciences have witnessed a paradigm shift in the way research is performed. Indeed, the computational part of biological and clinical studies has become central or is becoming so. Correspondingly, the amount of data that one needs to process, compare and analyze, has experienced an exponential growth. As a consequence, High Performance Computing (HPC, for short) is being used intensively, in particular in terms of multi-core architectures. However, recently and thanks to the advances in the processing of other scientific and commercial data, Distributed Computing is also being considered for Bioinformatics applications. In particular, the MapReduce paradigm, to…
Mapreduce in computational biology via hadoop and spark
2017
Bioinformatics has a long history of software solutions developed on multi-core computing systems for solving computational intensive problems. This option suffer from some issues solvable by shifting to Distributed Systems. In particular, the MapReduce computing paradigm, and its implementations, Hadoop and Spark, is becoming increasingly popular in the Bioinformatics field because it allows for virtual-unlimited horizontal scalability while being easy-to-use. Here we provide a qualitative evaluation of some of the most significant MapReduce bioinformatics applications. We also focus on one of these applications to show the importance of correctly engineering an application to fully exploi…
Data offloading and task allocation for cloudlet-assisted ad hoc mobile clouds
2016
Nowadays, although the data processing capabilities of the modern mobile devices are developed in a fast speed, the resources are still limited in terms of processing capacity and battery lifetime. Some applications, in particular the computationally intensive ones, such as multimedia and gaming, often require more computational resources than a mobile device can afford. One way to address such a problem is that the mobile device can offload those tasks to the centralized cloud with data centers, the nearby cloudlet or ad hoc mobile cloud. In this paper, we propose a data offloading and task allocation scheme for a cloudlet-assisted ad hoc mobile cloud in which the master device (MD) who ha…