Vinod Grover – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-06-13T18:47:35Z http://www.open-lab.net/blog/feed/ Vinod Grover <![CDATA[Run High-Performance LLM Inference Kernels from NVIDIA Using FlashInfer??]]> http://www.open-lab.net/blog/?p=102153 2025-06-13T18:47:35Z 2025-06-13T18:45:53Z Best-in-class LLM Inference requires two key elements: speed and developer velocity. Speed refers to maximizing the efficiency of the underlying hardware by...]]>

Best-in-class LLM Inference requires two key elements: speed and developer velocity. Speed refers to maximizing the efficiency of the underlying hardware by using highly optimized compute kernels algorithms. Developer velocity refers to the ability to quickly adopt these new kernels and accelerate new models, algorithms, and hardware. Ultimately, this velocity is underpinned by the quick…

Source

]]>
Vinod Grover <![CDATA[Using CUDA Warp-Level Primitives]]> http://www.open-lab.net/blog/?p=9333 2022-08-21T23:38:40Z 2018-01-16T02:01:05Z [caption id="attachment_7833" align="alignright" width="400"] Figure 1: The Tesla V100 Accelerator with Volta GV100 GPU. SXM2 Form Factor.[/caption] NVIDIA GPUs...]]>

NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion. Many CUDA programs achieve high performance by taking advantage of warp execution. In this blog we show how to use primitives introduced in CUDA 9 to make your warp-level programing safe and effective. NVIDIA GPUs and the CUDA programming model employ an execution model called SIMT…

Source

]]>
20
Vinod Grover <![CDATA[New Compiler Features in CUDA 8]]> http://www.open-lab.net/blog/parallelforall/?p=7346 2022-08-21T23:38:01Z 2016-11-08T07:14:00Z CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in...]]>

Source

]]>
3
���˳���97caoporen����