6533b86dfe1ef96bd12c9e7d

RESEARCH PRODUCT

UPC++ for bioinformatics: A case study using genome-wide association studies

Lars WienbrandtJorge González-domínguezBertil SchmidtJan Christian Kässens

subject

Object-oriented programmingComputingMethodologies_PATTERNRECOGNITIONComputer scienceComputationSingle-coreGenome-wide association studyPartitioned global address spaceParallel computingBioinformaticsSupercomputer

description

Modern genotyping technologies are able to obtain up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important but time-consuming operation since statistical computations have to be performed for each pair of measured markers. Therefore, a variety of HPC architectures have been used to accelerate these studies. In this work we present a parallel approach for multi-core clusters, which is implemented with UPC++ and takes advantage of the features available in the Partitioned Global Address Space and Object Oriented Programming models. Our solution is based on a well-known regression model (used by the popular BOOST tool) to test SNP-pairs interactions. Experimental results show that UPC++ is suitable for parallelizing data-intensive bioinformatics applications on clusters. For instance, it reduces the time to analyze a real-world dataset with more than 500,000 SNPs and 5,000 individuals from several days when using a single core to less than one minute using 512 nodes (12,288 cores) of a Cray XC30 supercomputer.

https://doi.org/10.1109/cluster.2014.6968770