Tuned math libraries are an easy and dependable way to extract the ultimate performance from your HPC system. However, for long-lived applications or those that need to run on a variety of platforms, adapting library calls for each vendor or library version can be a maintenance nightmare. A compiler that can automatically generate calls to tuned math libraries gives you the best of both…
]]>The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA’s Volta V100 and Turing GPUs. This enables scientific programmers using Fortran to take advantage of FP16 matrix operations accelerated by Tensor Cores. Let’s take a look at how Fortran supports Tensor Cores. Tensor Cores offer substantial performance gains over typical CUDA GPU core programming on Tesla V100…
]]>