• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • After clicking “Watch Now” you will be prompted to login or join.


    Click “Watch Now” to login or join the NVIDIA Developer Program.


    Scaling the Transformer Model Implementation in PyTorch Across Multiple Nodes

    Mohammad Zulfiqar, NVIDIA | Robert Knight, NVIDIA

    GTC 2020

    We'll dive deep behind the scenes into the Transformer model implementation in PyTorch to understand its performance weaknesses and work to make it scale across multiple nodes. We'll describe an analysis of system-level profiling data of an example Transformer workload, spanning multiple DGX-2 systems. We'll present the tools, collection methods, and data-analytics recipes, used to evaluate massive amounts of data and pinpoint the GPU/step of the algorithm causing issues. The described methodology can, in general, be applied to iterative DL and HPC workloads to achieve significant scaling gains.

    View More GTC 2020 Content