6533b82ffe1ef96bd12951fc

RESEARCH PRODUCT

MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems

Jorge González-domínguezBertil SchmidtJuan TouriñoYongchao Liu

subject

0301 basic medicineStatistics and ProbabilitySource codeComputer sciencemedia_common.quotation_subject02 engineering and technologyParallel computingcomputer.software_genreBiochemistryExecution time03 medical and health sciences0202 electrical engineering electronic engineering information engineeringCluster (physics)Point (geometry)Amino Acid SequenceMolecular Biologymedia_commonSequenceMultiple sequence alignmentProtein multiple sequenceComputational BiologyProteinsMarkov ChainsComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsDistributed memory systemsMSAProbs020201 artificial intelligence & image processingMPIData miningSequence AlignmentcomputerAlgorithmsSoftware

description

This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of recordJorge González-Domínguez, Yongchao Liu, Juan Touriño, Bertil Schmidt; MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems, Bioinformatics, Volume 32, Issue 24, 15 December 2016, Pages 3826–3828, https://doi.org/10.1093/bioinformatics/btw558is available online at: https://doi.org/10.1093/bioinformatics/btw558 [Abstracts] MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively.

10.1093/bioinformatics/btw558http://hdl.handle.net/2183/20963