6533b822fe1ef96bd127cedf

RESEARCH PRODUCT

Mapreduce in computational biology via hadoop and spark

Giuseppe CattaneoUmberto Ferraro PetrilloRaffaele GiancarloGianluca Roscigno

subject

BioinformaticSparkSettore INF/01 - InformaticaExploitbusiness.industryComputer scienceBioinformaticsDistributed computingScalabilityAlgorithm engineeringField (computer science)Distributed computingSoftwareAlgorithm engineering; Bioinformatics; Distributed computing; Hadoop; MapReduce; Scalability; SparkHadoopSpark (mathematics)ScalabilityData-intensive computingMapReducebusinessImplementationAlgorithm engineering

description

Bioinformatics has a long history of software solutions developed on multi-core computing systems for solving computational intensive problems. This option suffer from some issues solvable by shifting to Distributed Systems. In particular, the MapReduce computing paradigm, and its implementations, Hadoop and Spark, is becoming increasingly popular in the Bioinformatics field because it allows for virtual-unlimited horizontal scalability while being easy-to-use. Here we provide a qualitative evaluation of some of the most significant MapReduce bioinformatics applications. We also focus on one of these applications to show the importance of correctly engineering an application to fully exploit the potential of Distributed Systems.

10.1016/b978-0-12-809633-8.20371-3http://hdl.handle.net/11386/4775173