6533b86dfe1ef96bd12c9e7d
RESEARCH PRODUCT
UPC++ for bioinformatics: A case study using genome-wide association studies
Lars WienbrandtJorge González-domínguezBertil SchmidtJan Christian Kässenssubject
Object-oriented programmingComputingMethodologies_PATTERNRECOGNITIONComputer scienceComputationSingle-coreGenome-wide association studyPartitioned global address spaceParallel computingBioinformaticsSupercomputerdescription
Modern genotyping technologies are able to obtain up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important but time-consuming operation since statistical computations have to be performed for each pair of measured markers. Therefore, a variety of HPC architectures have been used to accelerate these studies. In this work we present a parallel approach for multi-core clusters, which is implemented with UPC++ and takes advantage of the features available in the Partitioned Global Address Space and Object Oriented Programming models. Our solution is based on a well-known regression model (used by the popular BOOST tool) to test SNP-pairs interactions. Experimental results show that UPC++ is suitable for parallelizing data-intensive bioinformatics applications on clusters. For instance, it reduces the time to analyze a real-world dataset with more than 500,000 SNPs and 5,000 individuals from several days when using a single core to less than one minute using 512 nodes (12,288 cores) of a Cray XC30 supercomputer.
year | journal | country | edition | language |
---|---|---|---|---|
2014-09-01 | 2014 IEEE International Conference on Cluster Computing (CLUSTER) |