• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • After clicking “Watch Now” you will be prompted to login or join.


    WATCH NOW



     
    Click “Watch Now” to login or join the NVIDIA Developer Program.

    WATCH NOW

    Microsoft Open Sources Breakthrough Optimizations for Large-Scale BERT Models

    Emma Ning, Microsoft | Nathan Yan, Microsoft

    GTC 2020

    We'll talk about BERT (Bidirectional Encoder Representations from Transformers), which is a popular deep-learning model used for natural language processing. Inferencing BERT at high scale is traditionally extremely costly, though, and may not even be possible with strict latency constraints. We'll share examples of how we've scaled BERT with NVIDIA GPUs, such as Bing improving BERT inference for its real-time service needs, serving more than 1 million BERT inferences per second within Bing's latency limits. We will discuss open source, enhanced versions of these optimizations that are now available in ONNX Runtime and have been extended to work on both GPU and CPU. With ONNX Runtime, AI developers can now easily produce large transformer models with high performance across both CPU and GPU hardware, using the same technology Microsoft uses to serve their customers.




    View More GTC 2020 Content

    人人超碰97caoporen国产