ChatGPT has made quite an impression. Users are excited to use the AI chatbot to ask questions, write poems, imbue a persona for interaction, act as a personal assistant, and more. Large language models (LLMs) power ChatGPT, and these models are the topic of this post. Before considering LLMs more carefully, we would first like to establish what a language model does. A language model gives��
]]>The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment scenarios. High-performance, accelerated AI platforms are needed to meet the demands of these applications and deliver the best user experiences. New AI models are constantly being invented to enable new capabilities��
]]>The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��
]]>MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and useful measures of deep learning performance. MLPerf training focuses on measuring time to train a range of commonly used neural networks for the following tasks: Lower training times are important to speed time to deployment��
]]>Sign up for the latest Speech AI news from NVIDIA. There is a high chance that you have asked your smart speaker a question like, ��How tall is Mount Everest?�� If you did, it probably said, ��Mount Everest is 29,032 feet above sea level.�� Have you ever wondered how it found an answer for you? Question answering (QA) is loosely defined as a system consisting of information retrieval (IR)��
]]>This post was originally published in August 2019 and has been updated for NVIDIA TensorRT 8.0. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Large-scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought exciting leaps in accuracy for many natural language processing��
]]>Predictive maintenance is used for early fault detection, diagnosis, and prediction when maintenance is needed in various industries including oil and gas, manufacturing, and transportation. Equipment is continuously monitored to measure things like sound, vibration, and temperature to alert and report potential issues. To accomplish this in computers, the first step is to determine the root cause��
]]>Speech and natural language processing (NLP) have become the foundation for most of the AI development in the enterprise today, as textual data represents a significant portion of unstructured content. As consumer internet companies continue to improve the accuracy of conversational AI, search, and recommendation systems, there is an increasing need for processing rich text data efficiently and��
]]>Conversational AI solutions such as chatbots are now deployed in the data center, on the cloud, and at the edge to deliver lower latency and high quality of service while meeting an ever-increasing demand. The strategic decision to run AI inference on any or all these compute platforms varies not only by the use case but also evolves over time with the business. Hence��
]]>Large language models such as Megatron and GPT-3 are transforming AI. We are excited about applications that can take advantage of these models to create better conversational AI. One main problem that generative language models have in conversational AI applications is their lack of controllability and consistency with real-world facts. In this work, we try to address this by making our large��
]]>This is the third post in this series about distilling BERT with multimetric Bayesian optimization. Part 1 discusses the background for the experiment and Part 2 discusses the setup for the Bayesian optimization. In my previous posts, I discussed the importance of BERT for transfer learning in NLP, and established the foundations of this experiment��s design. In this post, I discuss the model��
]]>This is the second post in this series about distilling BERT with multimetric Bayesian optimization. Part 1 discusses the background for the experiment and Part 3 discusses the results. In my previous post, I discussed the importance of the BERT architecture in making transfer learning accessible in NLP. BERT allows a variety of problems to share off-the-shelf, pretrained models and moves NLP��
]]>This is the first post in a series about distilling BERT with multimetric Bayesian optimization. Part 2 discusses the set up for the Bayesian experiment, and Part 3 discusses the results. You��ve all heard of BERT: Ernie��s partner in crime. Just kidding! I mean the natural language processing (NLP) architecture developed by Google in 2018. That��s much less exciting, I know. However��
]]>AI is going mainstream and is quickly becoming pervasive in every industry��from autonomous vehicles to drug discovery. However, developing and deploying AI applications is a challenging endeavor. The process requires building a scalable infrastructure by combining hardware, software, and intricate workflows, which can be time-consuming as well as error-prone. To accelerate the end-to-end AI��
]]>MLPerf is an industry-wide AI consortium that has developed a suite of performance benchmarks covering a range of leading AI workloads that are widely in use today. The latest MLPerf v0.7 training submission includes vision, language, recommenders, and reinforcement learning. NVIDIA submitted MLPerf v0.7 training results for all eight tests and the NVIDIA platform set records in all��
]]>The MLPerf consortium mission is to ��build fair and useful benchmarks�� to provide an unbiased training and inference performance reference for ML hardware, software, and services. MLPerf Training v0.7 is the third instantiation for training and continues to evolve to stay on the cutting edge. This round consists of eight different workloads that cover a broad diversity of use cases��
]]>Imagine an AI program that can understand language better than humans can. Imagine building your own personal Siri or Google Search for a customized domain or application. Google BERT (Bidirectional Encoder Representations from Transformers) provides a game-changing twist to the field of natural language processing (NLP). BERT runs on supercomputers powered by NVIDIA GPUs to train its��
]]>Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as question-answering, dialog systems, summarization, and article completion. However, during training, large models do not fit in the available memory of a single accelerator, requiring model parallelism to split the parameters across multiple��
]]>