Omri Almog – NVIDIA Technical Blog

Omri Almog – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-06-26T19:26:52Z http://www.open-lab.net/blog/feed/ Omri Almog <![CDATA[Introducing NVFP4 for Efficient and Accurate Low-Precision Inference]]> http://www.open-lab.net/blog/?p=102000 2025-06-26T19:26:52Z 2025-06-24T16:18:46Z

To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques��such as...]]>

To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques—such as quantization, distillation, and pruning—typically come to mind. The most common of the three, without a doubt, is quantization. This is typically due to its post-optimization task-specific accuracy performance and broad choice of supported…

]]> Omri Almog <![CDATA[NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance]]> http://www.open-lab.net/blog/?p=97352 2025-04-23T00:23:25Z 2025-03-18T17:41:42Z

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...]]>

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a maximum throughput of over 30,000 tokens per second on the massive, state-of-the-art 671 billion parameter DeepSeek-R1 model. These rapid advancements in performance at both ends of the performance…

]]> 1 ��˳��97caoporen��