Extending PluTo for Multiple Devices by Integrating OpenACC
For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while dif…