Search results for " Distributed Computing"

showing 10 items of 87 documents

On the Use of Binary Trees for DNA Hydroxymethylation Analysis

2017

DNA methylation (mC) and hydroxymethylation (hmC) can have a significant effect on normal human development, health and disease status. Hydroxymethylation studies require specific treatment of DNA, as well as software tools for their analysis. In this paper, we propose a parallel software tool for analyzing the DNA hydroxymethylation data obtained by TAB-seq. The software is based on the use of binary trees for searching the different occurrences of methylation and hydroxymethylation in DNA samples. The binary trees allow to efficiently store and access the information about the methylation of each methylated/hydroxymethylated cytosines in the samples. Evaluation results shows that the perf…

0301 basic medicineDNA Hydroxymethylation020203 distributed computingBinary treebusiness.industryComputer science02 engineering and technologyMethylationComputational biologySupercomputer03 medical and health scienceschemistry.chemical_compound030104 developmental biologySoftwareParallel softwarechemistryDNA methylation0202 electrical engineering electronic engineering information engineeringheterocyclic compoundsbusinessDNA

researchProduct

Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms

2018

Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…

0301 basic medicineEpigenomicsgenomic analysis; hadoop; distributed computingStatistics and ProbabilityComputer scienceBig dataSequence assemblyGenomeBiochemistryDomain (software engineering)Set (abstract data type)03 medical and health sciencesdistributed computingSoftwareComputational Theory and MathematicAnimalsCluster AnalysisHumansA-DNAk-mer counting distributed computing hadoop map reduceMolecular BiologyEpigenomicsBacteriabusiness.industryk-mer countingEukaryotaLinguisticsComputer Science Applications1707 Computer Vision and Pattern RecognitionGenomicsSequence Analysis DNAComputer Science ApplicationsComputational Mathematics030104 developmental biologymap reduceComputational Theory and MathematicsDistributed algorithmgenomic analysisKernel (statistics)MetagenomehadoopbusinessAlgorithmAlgorithmsSoftware

researchProduct

2016

The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 alig…

0301 basic medicinePhysics020203 distributed computingMulti-core processorDistributed shared memoryMultidisciplinarySource codemedia_common.quotation_subjectNode (networking)02 engineering and technologyDynamic priority schedulingParallel computingBioinformatics03 medical and health sciences030104 developmental biologyScalability0202 electrical engineering electronic engineering information engineeringProgramming paradigmPartitioned global address spacemedia_commonPLOS ONE

researchProduct

An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop

2016

Alignment-free methods are one of the mainstays of biological sequence comparison, i.e., the assessment of how similar two biological sequences are to each other, a fundamental and routine task in computational biology and bioinformatics. They have gained popularity since, even on standard desktop machines, they are faster than methods based on alignments. However, with the advent of Next-Generation Sequencing Technologies, datasets whose size, i.e., number of sequences and their total length, is a challenge to the execution of alignment-free methods on those standard machines are quite common. Here, we propose the first paradigm for the computation of k-mer-based alignment-free methods for…

0301 basic medicineTheoretical computer science030102 biochemistry & molecular biologySettore INF/01 - InformaticaComputer scienceComputationExtension (predicate logic)Information SystemHash tableDistributed computingTask (project management)Theoretical Computer Science03 medical and health sciences030104 developmental biologyAlignment-free sequence comparison and analysisHadoopHardware and Architecturealignment-free sequence comparison and analysis; distributed computing; Hadoop; MapReduce; software; theoretical computer science; information systems; hardware and architectureSequence comparisonMapReduceAlignment-free sequence comparison and analysiAlignment-free sequence comparison and analysis; Distributed computing; Hadoop; MapReduce; Theoretical Computer Science; Software; Information Systems; Hardware and ArchitectureSoftwareInformation Systems

researchProduct

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters

2016

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data par…

0301 basic medicineXeon Phi clustersComputer scienceData parallelismParallel algorithm02 engineering and technologyDynamic programmingBiochemistryPairwise sequence alignmentComputational science03 medical and health sciencesStructural BiologyComputer cluster0202 electrical engineering electronic engineering information engineeringAmino Acid SequenceDatabases ProteinMolecular Biology020203 distributed computingResearchApplied MathematicsComputational BiologyProteinsSmith-WatermanComputer Science Applications030104 developmental biologyMultiple sequence alignmentDatabases Nucleic AcidSequence AlignmentAlgorithmsSoftwareXeon PhiBMC Bioinformatics

researchProduct

Near field improvements of stochastic collaborative beamforming in wireless sensor networks

2020

Wireless sensor networks (WSN) are groups of small devices that contain a microcontroller in which a large number of sensors can be added. They transmit data and communicate to each other in the ISM band, standard IEEE 802.15.4, exchanging packets using a multi-hop routing. These devices are named motes and are nodes of the WSN. They are very simple and easy to program, powered by batteries of 1.5Volts (AA and AAA). The nodes are autonomous elements that can be deployed implementing any type of network. In a typical deployment the nodes communicate with each other and with a master node or Base Station (BS), which in turn transmits the information to an external server, which collects the e…

Beamforming020203 distributed computingNetwork packetbusiness.industryComputer scienceNode (networking)ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS02 engineering and technologySynchronizationBase stationTransmission (telecommunications)0202 electrical engineering electronic engineering information engineeringComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS020201 artificial intelligence & image processingbusinessWireless sensor networkISM bandComputer networkProceedings of the 10th Euro-American Conference on Telematics and Information Systems

researchProduct

Big Data Processing in the ATLAS Experiment: Use Cases and Experience

2015

Abstract The physics goals of the next Large Hadron Collider run include high precision tests of the Standard Model and searches for new physics. These goals require detailed comparison of data with computational models simulating the expected data behavior. To highlight the role which modeling and simulation plays in future scientific discovery, we report on use cases and experience with a unified system built to process both real and simulated data of growing volume and variety.

Big DataComputational modelLarge Hadron ColliderComputer sciencebusiness.industryPhysics beyond the Standard ModelData managementBig dataATLAS experimentcomputer.software_genreData scienceStandard ModelModeling and simulationParallel and Distributed ComputingGrid-based Simulation and ComputingGrid computingLarge Scale Scientific InstrumentsGeneral Earth and Planetary SciencesUse casebusinesscomputerGeneral Environmental ScienceProcedia Computer Science

researchProduct

Mapreduce in computational biology - A synopsis

2017

In the past 20 years, the Life Sciences have witnessed a paradigm shift in the way research is performed. Indeed, the computational part of biological and clinical studies has become central or is becoming so. Correspondingly, the amount of data that one needs to process, compare and analyze, has experienced an exponential growth. As a consequence, High Performance Computing (HPC, for short) is being used intensively, in particular in terms of multi-core architectures. However, recently and thanks to the advances in the processing of other scientific and commercial data, Distributed Computing is also being considered for Bioinformatics applications. In particular, the MapReduce paradigm, to…

BioinformaticSpark0301 basic medicineSettore INF/01 - InformaticaBioinformaticsProcess (engineering)Computer scienceComputer Science (all)Computational biologybioinformatics; distributed computing; hadoop; MapReduce; spark; computer science (all)Supercomputercomputer.software_genreDistributed computing03 medical and health sciences030104 developmental biologyExponential growthHadoopParadigm shiftMiddleware (distributed applications)Spark (mathematics)MapReducecomputer

researchProduct

Mapreduce in computational biology via hadoop and spark

2017

Bioinformatics has a long history of software solutions developed on multi-core computing systems for solving computational intensive problems. This option suffer from some issues solvable by shifting to Distributed Systems. In particular, the MapReduce computing paradigm, and its implementations, Hadoop and Spark, is becoming increasingly popular in the Bioinformatics field because it allows for virtual-unlimited horizontal scalability while being easy-to-use. Here we provide a qualitative evaluation of some of the most significant MapReduce bioinformatics applications. We also focus on one of these applications to show the importance of correctly engineering an application to fully exploi…

BioinformaticSparkSettore INF/01 - InformaticaExploitbusiness.industryComputer scienceBioinformaticsDistributed computingScalabilityAlgorithm engineeringField (computer science)Distributed computingSoftwareAlgorithm engineering; Bioinformatics; Distributed computing; Hadoop; MapReduce; Scalability; SparkHadoopSpark (mathematics)ScalabilityData-intensive computingMapReducebusinessImplementationAlgorithm engineering

researchProduct

Data offloading and task allocation for cloudlet-assisted ad hoc mobile clouds

2016

Nowadays, although the data processing capabilities of the modern mobile devices are developed in a fast speed, the resources are still limited in terms of processing capacity and battery lifetime. Some applications, in particular the computationally intensive ones, such as multimedia and gaming, often require more computational resources than a mobile device can afford. One way to address such a problem is that the mobile device can offload those tasks to the centralized cloud with data centers, the nearby cloudlet or ad hoc mobile cloud. In this paper, we propose a data offloading and task allocation scheme for a cloudlet-assisted ad hoc mobile cloud in which the master device (MD) who ha…

Computer Networks and CommunicationsComputer sciencemobile cloud computingDistributed computingMobile computingCloud computing02 engineering and technologyad hoc mobile cloudoffloading0202 electrical engineering electronic engineering information engineeringStackelberg competitionCloudletElectrical and Electronic Engineering020203 distributed computingbusiness.industrycloud computing020206 networking & telecommunicationsEnergy consumptionMobile ad hoc networkstackelberg gameMobile cloud computingTask (computing)businessMobile deviceInformation SystemsComputer networkcloudlet

researchProduct