Neelay Shah – NVIDIA Technical Blog

Neelay Shah – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-15T19:07:45Z http://www.open-lab.net/blog/feed/ Neelay Shah <![CDATA[LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM]]> http://www.open-lab.net/blog/?p=99180 2025-05-15T19:07:45Z 2025-05-06T17:35:39Z

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. When building LLM-based applications, it is critical to understand the performance characteristics of these models on a given hardware. This serves multiple purposes: As a client-side LLM-focused benchmarking tool…

]]> Neelay Shah <![CDATA[LLM Inference Benchmarking: Fundamental Concepts]]> http://www.open-lab.net/blog/?p=98215 2025-05-09T18:23:04Z 2025-04-02T17:00:00Z

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM...]]>

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM benchmarking, fundamental concepts, and how to benchmark your LLM applications. The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution.

]]> Neelay Shah <![CDATA[Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models]]> http://www.open-lab.net/blog/?p=95274 2025-04-23T00:15:55Z 2025-03-18T17:50:00Z

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for...]]>

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.

]]> 1 Neelay Shah <![CDATA[Generate Stunning Images with Stable Diffusion XL on the NVIDIA AI Inference Platform]]> http://www.open-lab.net/blog/?p=78388 2025-03-18T18:31:44Z 2024-03-07T19:05:46Z

Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by...]]>

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by iteratively shaping random noise into AI-generated art through denoising diffusion techniques. This can be applied to many enterprise use cases such as creating personalized…

]]> 1 ��˳��97caoporen��