Neal Vaidya – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-11-20T23:03:22Z http://www.open-lab.net/blog/feed/ Neal Vaidya <![CDATA[Seamlessly Deploying a Swarm of LoRA Adapters with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=83606 2024-06-13T19:06:00Z 2024-06-07T16:00:00Z The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They...]]>

The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They often achieve striking results on a wide variety of use cases without any need for customization. Despite this, studies have shown that the best accuracy on downstream tasks can be achieved by adapting LLMs with high-quality…

Source

]]>
Neal Vaidya <![CDATA[Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server]]> http://www.open-lab.net/blog/?p=81223 2024-11-14T15:54:32Z 2024-04-28T18:07:15Z We're excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You...]]>

We’re excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You can immediately try Llama 3 8B and Llama 3 70B—the first models in the series—through a browser user interface. Or, through API endpoints running on a fully accelerated NVIDIA stack from the NVIDIA API catalog, where Llama 3 is packaged as…

Source

]]>
61
Neal Vaidya <![CDATA[NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale]]> http://www.open-lab.net/blog/?p=79467 2024-06-03T15:44:17Z 2024-03-18T22:00:00Z The rise in generative AI adoption has been remarkable. Catalyzed by the launch of OpenAI��s ChatGPT in 2022, the new technology amassed over 100M users within...]]>

The rise in generative AI adoption has been remarkable. Catalyzed by the launch of OpenAI’s ChatGPT in 2022, the new technology amassed over 100M users within months and drove a surge of development activities across almost every industry. By 2023, developers began POCs using APIs and open-source community models from Meta, Mistral, Stability, and more. Entering 2024…

Source

]]>
Neal Vaidya <![CDATA[Mastering LLM Techniques: Inference Optimization]]> http://www.open-lab.net/blog/?p=73739 2024-01-25T18:57:32Z 2023-11-17T15:00:00Z Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a...]]>

Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a wide range of language tasks. These foundation models are expensive to train, and they can be memory- and compute-intensive during inference (a recurring cost). The most popular large language models (LLMs) today can reach tens to hundreds of…

Source

]]>
0
Neal Vaidya <![CDATA[NVIDIA AI Foundation Models: Build Custom Enterprise Chatbots and Co-Pilots with Production-Ready LLMs]]> http://www.open-lab.net/blog/?p=73296 2024-11-20T23:03:22Z 2023-11-15T16:00:00Z Large language models (LLMs) are revolutionizing data science, enabling advanced capabilities in natural language understanding, AI, and machine learning....]]>

Large language models (LLMs) are revolutionizing data science, enabling advanced capabilities in natural language understanding, AI, and machine learning. Custom LLMs, tailored for domain-specific insights, are finding increased traction in enterprise applications. The NVIDIA Nemotron-3 8B family of foundation models is a powerful new tool for building production-ready generative AI…

Source

]]>
4
Neal Vaidya <![CDATA[Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available]]> http://www.open-lab.net/blog/?p=71648 2024-04-19T15:19:08Z 2023-10-19T16:00:00Z Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source...]]>

Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source library is now available for free on the /NVIDIA/TensorRT-LLM GitHub repo and as part of the NVIDIA NeMo framework. Large language models (LLMs) have revolutionized the field of artificial intelligence and created entirely new ways of…

Source

]]>
8
Neal Vaidya <![CDATA[NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs]]> http://www.open-lab.net/blog/?p=70549 2023-11-07T22:27:14Z 2023-09-09T17:00:00Z Large language models (LLMs) offer incredible new capabilities, expanding the frontier of what is possible with AI. However, their large size and unique...]]>

Large language models (LLMs) offer incredible new capabilities, expanding the frontier of what is possible with AI. However, their large size and unique execution characteristics can make them difficult to use in cost-effective ways. NVIDIA has been working closely with leading companies, including Meta, Anyscale, Cohere, Deci, Grammarly, Mistral AI, MosaicML (now a part of Databricks)…

Source

]]>
5
Neal Vaidya <![CDATA[Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production]]> http://www.open-lab.net/blog/?p=59514 2023-10-20T18:16:30Z 2023-01-12T17:30:00Z Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process...]]>

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process the audio signal and transcribe the audio to text. Speech synthesis or TTS can generate high-quality, natural-sounding audio from the text in real time. The challenge of Speech AI is to achieve high accuracy and meet the latency requirements…

Source

]]>
0
Neal Vaidya <![CDATA[Solving AI Inference Challenges with NVIDIA Triton]]> http://www.open-lab.net/blog/?p=54906 2023-03-22T01:21:27Z 2022-09-21T16:00:00Z Deploying AI models in production to meet the performance and scalability requirements of the AI-driven application while keeping the infrastructure costs low...]]>

Deploying AI models in production to meet the performance and scalability requirements of the AI-driven application while keeping the infrastructure costs low is a daunting task. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. This post provides you with a high-level overview of AI…

Source

]]>
0
���˳���97caoporen����