Karin Sevegnani – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-24T18:33:25Z http://www.open-lab.net/blog/feed/ Karin Sevegnani <![CDATA[Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training]]> http://www.open-lab.net/blog/?p=102820 2025-07-24T18:33:25Z 2025-07-01T18:13:50Z In this blog post, we��ll break down the main FP8 scaling strategies��per-tensor scaling, delayed and current scaling, and per-block scaling (including the...]]>

In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the Blackwell-backed MXFP8 format)—and explain why each is essential for maintaining numerical stability and accuracy during low-precision training. Understanding these approaches will help with choosing the right recipe for your own FP8 workflows.

Source

]]>
Karin Sevegnani <![CDATA[Floating-Point 8: An Introduction to Efficient, Lower-Precision AI Training]]> http://www.open-lab.net/blog/?p=101197 2025-06-12T18:50:43Z 2025-06-04T16:27:30Z With the growth of large language models (LLMs), deep learning is advancing both model architecture design and computational efficiency. Mixed precision...]]>

With the growth of large language models (LLMs), deep learning is advancing both model architecture design and computational efficiency. Mixed precision training, which strategically employs lower precision formats like brain floating point 16 (BF16) for computationally intensive operations while retaining the stability of 32-bit floating-point (FP32) where needed, has been a key strategy for…

Source

]]>
Karin Sevegnani <![CDATA[Advanced Optimization Strategies for LLM Training on NVIDIA Grace Hopper]]> http://www.open-lab.net/blog/?p=100702 2025-06-12T18:50:59Z 2025-05-27T17:31:00Z In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training...]]>

In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training workflows and analyzed bottlenecks using NVIDIA Nsight Systems. We also discussed how the NVIDIA GH200 Grace Hopper Superchip enables efficient training processes. While profiling helps identify inefficiencies…

Source

]]>
Karin Sevegnani <![CDATA[Profiling LLM Training Workflows on NVIDIA Grace Hopper]]> http://www.open-lab.net/blog/?p=100669 2025-06-12T18:51:00Z 2025-05-27T17:30:00Z The rapid advancements in AI have resulted in an era of exponential growth in model sizes, particularly in the domain of large language models (LLMs). These...]]>

The rapid advancements in AI have resulted in an era of exponential growth in model sizes, particularly in the domain of large language models (LLMs). These models, with their transformative capabilities, are driving innovation across industries. However, the increasing complexity and computational demands of training such models necessitate a meticulous approach to optimization and profiling.

Source

]]>
Karin Sevegnani <![CDATA[Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=99202 2025-05-15T19:08:40Z 2025-04-24T17:00:00Z This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. Researchers from the University College London (UCL) Deciding, Acting, and Reasoning with Knowledge (DARK) Lab leverage NVIDIA NIM microservices in their new game-based benchmark suite, Benchmarking Agentic LLM and VLM Reasoning On Games…

Source

]]>
���˳���97caoporen����