6533b821fe1ef96bd127ad2b
RESEARCH PRODUCT
Empirical Autotuning of Two-level Parallel Linear Algebra Routines on Large cc-NUMA Systems
Javier CuencaJesus C'maraAntonio M. VidalDomingo Giménezsubject
Task (computing)Selection (relational algebra)Memory hierarchyComputer scienceMultithreadingLinear algebraParallelism (grammar)Parallel computingTemporal multithreadingMatrix multiplicationdescription
In large cc-NUMA systems the efficient use of the different levels of the memory hierarchy is not an easy task, and the performance of multithreading implementations of the libraries decreases when the number of cores used increases, so producing an important lost of efficiency. To alleviate this problem, routines with multilevel parallelism can be developed by combining OpenMP and BLAS parallelism. In that way, higher performance can be achieved, but it is necessary to develop some autotuning technique for the appropriate selection of the number of threads to use at each level. The selection can be made through theoretical models of the execution time or some installation methodology. This work analyses some installation techniques for a two-level matrix multiplication routine, with the aim of developing a valid methodology for other linear algebra routines in large cc-NUMA systems. The basic ideas of the two-level parallelisation and the installation methodology are discussed and some experimental results are commented on.
year | journal | country | edition | language |
---|---|---|---|---|
2012-07-01 | 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications |