The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They often achieve striking results on a wide variety of use cases without any need for customization. Despite this, studies have shown that the best accuracy on downstream tasks can be achieved by adapting LLMs with high-quality…
]]>We’re excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You can immediately try Llama 3 8B and Llama 3 70B—the first models in the series—through a browser user interface. Or, through API endpoints running on a fully accelerated NVIDIA stack from the NVIDIA API catalog, where Llama 3 is packaged as…
]]>The rise in generative AI adoption has been remarkable. Catalyzed by the launch of OpenAI’s ChatGPT in 2022, the new technology amassed over 100M users within months and drove a surge of development activities across almost every industry. By 2023, developers began POCs using APIs and open-source community models from Meta, Mistral, Stability, and more. Entering 2024…
]]>Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a wide range of language tasks. These foundation models are expensive to train, and they can be memory- and compute-intensive during inference (a recurring cost). The most popular large language models (LLMs) today can reach tens to hundreds of…
]]>Large language models (LLMs) are revolutionizing data science, enabling advanced capabilities in natural language understanding, AI, and machine learning. Custom LLMs, tailored for domain-specific insights, are finding increased traction in enterprise applications. The NVIDIA Nemotron-3 8B family of foundation models is a powerful new tool for building production-ready generative AI…
]]>Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source library is now available for free on the /NVIDIA/TensorRT-LLM GitHub repo and as part of the NVIDIA NeMo framework. Large language models (LLMs) have revolutionized the field of artificial intelligence and created entirely new ways of…
]]>Large language models (LLMs) offer incredible new capabilities, expanding the frontier of what is possible with AI. However, their large size and unique execution characteristics can make them difficult to use in cost-effective ways. NVIDIA has been working closely with leading companies, including Meta, Anyscale, Cohere, Deci, Grammarly, Mistral AI, MosaicML (now a part of Databricks)…
]]>Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process the audio signal and transcribe the audio to text. Speech synthesis or TTS can generate high-quality, natural-sounding audio from the text in real time. The challenge of Speech AI is to achieve high accuracy and meet the latency requirements…
]]>Deploying AI models in production to meet the performance and scalability requirements of the AI-driven application while keeping the infrastructure costs low is a daunting task. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. This post provides you with a high-level overview of AI…
]]>