The newest generation of the popular Llama AI models is here with Llama 4 Scout and Llama 4 Maverick. Accelerated by NVIDIA open-source software, they can achieve over 40K output tokens per second on NVIDIA Blackwell B200 GPUs, and are available to try as NVIDIA NIM microservices. The Llama 4 models are now natively multimodal and multilingual using a mixture-of-experts (MoE) architecture.
]]>Building AI systems with foundation models requires a delicate balancing of resources such as memory, latency, storage, compute, and more. One size does not fit all for developers managing cost and user experience when bringing generative AI capability to the rapidly growing ecosystem of AI-powered applications. You need options for high-quality, customizable models that can support large��
]]>Mathstral, an advanced AI model developed from the ground up, can deliver superior performance for enhanced learning of math, engineering, and science.
]]>Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate greater capabilities in domain-specific tasks compared to their off-the-shelf open or commercial counterparts. Recently, NVIDIA published a paper about ChipNeMo, a family of foundation models that are geared toward industrial chip design��
]]>NVIDIA TAO is a framework designed to simplify and accelerate the development and deployment of AI models. It enables you to use pretrained models, fine-tune them with your own data, and optimize the models for specific use cases without needing deep AI expertise. TAO integrates seamlessly with the NVIDIA hardware and software ecosystem, providing tools for efficient AI model training��
]]>NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team just released?Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. Canary also provides bi-directional translation, between English and the three other supported��
]]>NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released Parakeet-TDT. This new addition to the?NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B. This post explains Parakeet-TDT and how to use it to generate highly accurate��
]]>Generative AI is transforming computing, paving new avenues for humans to interact with computers in natural, intuitive ways. For enterprises, the prospect of generative AI is vast. Businesses can tap into their rich datasets to streamline time-consuming tasks��from text summarization and translation to insight prediction and content generation. But they must also navigate adoption challenges.
]]>Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition (ASR) family of models and the NVIDIA Canary multilingual, multitask ASR and translation model currently top the Hugging Face Open ASR Leaderboard. In addition, a multilingual P-Flow-based text-to-speech (TTS) model won the LIMMITS ��24��
]]>Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI landscape. These models leverage the recently introduced FastConformer architecture and were trained simultaneously with CTC and transducer objectives to maximize each model��s accuracy. Automatic speech recognition (ASR) is a��
]]>Deep neural networks (DNNs) are the go-to model for learning functions from data, such as image classifiers or language models. In recent years, deep models have become popular for representing the data samples themselves. For example, a deep model can be trained to represent an image, a 3D object, or a scene, an approach called Implicit Neural Representations. (See also Neural Radiance Fields and��
]]>Optical Character Detection (OCD) and Optical Character Recognition (OCR) are computer vision techniques used to extract text from images. Use cases vary across industries and include extracting data from scanned documents or forms with handwritten texts, automatically recognizing license plates, sorting boxes or objects in a fulfillment center based on serial numbers��
]]>NVIDIA Triton Inference Server streamlines and standardizes AI inference by enabling teams to deploy, run, and scale trained ML or DL models from any framework on any GPU- or CPU-based infrastructure. It helps developers deliver high-performance inference across cloud, on-premises, edge, and embedded devices. The nvOCDR library is integrated into Triton for inference.
]]>For more information about NVIDIA NeMo, see Develop Custom Enterprise Generative AI with NVIDIA NeMo. Generative AI has introduced a new era in computing, one promising to revolutionize human-computer interaction. At the forefront of this technological marvel are large language models (LLMs), empowering enterprises to recognize, summarize, translate, predict, and generate content using large��
]]>NVIDIA TAO Toolkit provides a low-code AI framework to accelerate vision AI model development suitable for all skill levels, from novice beginners to expert data scientists. With the TAO Toolkit, developers can use the power and efficiency of transfer learning to achieve state-of-the-art accuracy and production-class throughput in record time with adaptation and optimization.
]]>The analysis of 3D medical images is crucial for advancing clinical responses, disease tracking, and overall patient survival. Deep learning models form the backbone of modern 3D medical representation learning, enabling precise spatial context measurements that are essential for clinical decision-making. These 3D representations are highly sensitive to the physiological properties of medical��
]]>NVIDIA PhysicsNeMo is a framework for building, training, and fine-tuning deep learning models for physical systems, otherwise known as physics-informed machine learning (physics-ML) models. PhysicsNeMo is available as OSS (Apache 2.0 license) to support the growing physics-ML community. The latest PhysicsNeMo software update, version 23.05, brings together new capabilities��
]]>Training AI models requires mountains of data. Acquiring large sets of training data can be difficult, time-consuming, and expensive. Also, the data collected may not be able to cover various corner cases, preventing the AI model from accurately predicting a wide variety of scenarios. Synthetic data offers an alternative to real-world data, enabling AI researchers and engineers to bootstrap��
]]>AI is impacting every industry, from improving customer service and streamlining supply chains to accelerating cancer research. As enterprises invest in AI to stay ahead of the competition, they often struggle with finding the strategy and infrastructure for success. Many AI projects are rapidly evolving, which makes production at scale especially challenging. We believe in developing��
]]>Have you ever tried to fine-tune a speech recognition system on your accent only to find that, while it recognizes your voice well, it fails to detect words spoken by others? This is common in speech recognition systems that have trained on hundreds of thousands of hours of speech. In large-scale automatic speech recognition (ASR), a system may perform well in many but not all scenarios.
]]>Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language. This is fueled by the growing multilingual communities as well as by the need to reduce complexity. You only need one model to handle multiple languages. This post explains how to use pretrained multilingual NeMo ASR models from the��
]]>Leveraging image classification, object detection, automatic speech recognition (ASR), and other forms of AI can fuel massive transformation within companies and business sectors. However, building AI and deep learning models from scratch is a daunting task. A common prerequisite for building these models is having a large amount of high-quality training data and the right expertise to��
]]>Retail shrinkage is on the rise with industry losses totaling $100B in 2021 and growing, due to inflationary pressures. To help software developers accelerate the development of retail loss prevention solutions, NVIDIA is releasing a suite of microservices as part of NVIDIA Metropolis, along with retail AI workflows. These AI workflows deliver pretrained AI models along with the applications��
]]>A pretrained AI model is a deep learning model that��s trained on large datasets to accomplish a specific task, and it can be used as is or customized to suit application requirements across multiple industries.
]]>The year 2022 has thus far been a momentous, thrilling, and an overwhelming year for AI aficionados. Get3D is pushing the boundaries of generative 3D modeling, an AI model can now diagnose breast cancer from MRIs as accurately as board-certified radiologists, and state-of-the-art speech AI models have widened their horizons to extended reality. Pretrained models from NVIDIA have redefined��
]]>Large language models (LLMs) are some of the most advanced deep learning algorithms that are capable of understanding written language. Many modern LLMs are built using the transformer network introduced by Google in 2017 in the Attention Is All You Need research paper. NVIDIA NeMo framework is an end-to-end GPU-accelerated framework for training and deploying transformer-based LLMs up to a��
]]>New SDKs are available in the NGC catalog, a hub of GPU-optimized deep learning, machine learning, and HPC applications. With highly performant software containers, pretrained models, industry-specific SDKs, and Jupyter notebooks available, AI developers and data scientists can simplify and reduce complexities in their end-to-end workflows. This post provides an overview of new and updated��
]]>Accurate, fast object detection is an important task in robotic navigation and collision avoidance. Autonomous agents need a clear map of their surroundings to navigate to their destination while avoiding collisions. For example, in warehouses that use autonomous mobile robots (AMRs) to transport objects, avoiding hazardous machines that could potentially damage robots has become a challenging��
]]>Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours of speech. Recent literature has renewed focus on more complex languages, such as Japanese. Like other Asian languages, Japanese has a vast base character set (upwards of 3,000 unique characters are used in common vernacular)��
]]>No one likes standing around and waiting for the bus to arrive, especially when you need to be somewhere on time. Wouldn��t it be great if you could predict when the next bus is due to arrive? At the beginning of this year, Armenian developer Edgar Gomtsyan had some time to spare, and he puzzled over this very question. Rather than waiting for a government entity to implement a solution��
]]>They say ��imitation is the sincerest form of flattery.�� Well, in the case of a robotics project by Polish-based developer Tomasz Tomanek, imitation��or mimicry��is the goal of his robot named Mariola. In this latest Jetson Project of the Month, Tomanek has developed a funky little robot using pretrained machine learning models to make human-robot interactions come to life.
]]>NVIDIA Metropolis partner Sighthound��formerly Boulder AI��is helping cities improve traffic management and pedestrian safety with software and hardware solutions to bring cloud-native solutions for edge data intelligence. To design efficient, equitable, and sustainable infrastructure, city planners rely on accurate roadway usage data. Sighthound builds edge-enabled��
]]>Join NVIDIA experts and Metropolis partners on Sept. 22 for webinars exploring developer SDKs, GPUs, go-to-market opportunities, and more. All three sessions, each with unique speakers and content, will be recorded and will be available for on-demand viewing later. Register Now >> Wednesday, September 22, 2021, 1 PM PDT Wednesday, September 22, 2021��
]]>The NVIDIA NGC team is hosting a webinar with live Q&A to dive into this Jupyter notebook available from the NGC catalog. Learn how to use these resources to kickstart your AI journey. Register now: NVIDIA NGC Jupyter Notebook Day: Medical Imaging Segmentation. Image segmentation partitions a digital image into multiple segments by changing the representation into something more meaningful��
]]>Today, NVIDIA announced new pretrained models and the general availability of TAO Toolkit 3.0, a core component of the NVIDIA Train, Adapt, and Optimize (TAO) platform-guided workflow for creating AI. The new release includes a variety of highly accurate and performant pretrained models in computer vision and conversational AI, as well as a set of powerful productivity features that boost AI��
]]>You can save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label Studio. NVIDIA NeMo provides reusable neural modules that make it easy to create new neural network architectures, including prebuilt modules and ready-to-use models for ASR. With the power of NVIDIA NeMo, you can get audio transcriptions��
]]>One of the main challenges and goals when creating an AI application is producing a robust model that is performant with high accuracy. Building such a deep learning model is time consuming. It can take weeks or months of retraining, fine-tuning, and optimizing until the model satisfies the necessary requirements. For many developers, building a deep learning AI pipeline from scratch is not a��
]]>Automatic license plate recognition (ALPR) on stationary to fast-moving vehicles is one of the common intelligent video analytics applications for smart cities. Some of the common use cases include parking assistance systems, automated toll booths, vehicle registration and identification for delivery and logistics at ports, and medical supply transporting warehouses. Being able to do this in real��
]]>Intelligent vision and speech-enabled services have now become mainstream, impacting almost every aspect of our everyday life. AI-enabled video and audio analytics are enhancing applications from consumer products to enterprise services. Smart speakers at home. Smart kiosks or chatbots in retail stores. Interactive robots on factory floors. Intelligent patient monitoring systems at hospitals.
]]>AI workflows are complex. Building an AI application is no trivial task, as it takes various stakeholders with domain expertise to develop and deploy the application at scale. Data scientists and developers need easy access to software building blocks, such as models and containers, that are not only secure and highly performant, but which have the necessary underlying architecture to build their��
]]>This is an updated version of Neural Modules for Fast Development of Speech and Language Models. This post upgrades the NeMo diagram with PyTorch and PyTorch Lightning support and updates the tutorial with the new code base. As a researcher building state-of-the-art speech and language models, you must be able to quickly experiment with novel network architectures.
]]>With the advent of new deep learning approaches based on transformer architecture, natural language processing (NLP) techniques have undergone a revolution in performance and capabilities. Cutting-edge NLP models are becoming the core of modern search engines, voice assistants, chatbots, and more. Modern NLP models can synthesize human-like text and answer questions posed in natural language.
]]>Deep neural network (DNN) models are routinely used in applications requiring analysis of video stream content. These may include object detection, classification, and segmentation. Typically, these models are trained on servers with high-end GPUs, either in stand-alone servers, such as NVIDIA DGX1, or on servers available in data centers or private or public clouds. Such systems often use��
]]>Deep neural networks (DNNs) have been successfully applied to volume segmentation and other medical imaging tasks. They are capable of achieving state-of-the-art accuracy and can augment the medical imaging workflow with AI-powered insights. However, training robust AI models for medical imaging analysis is time-consuming and tedious and requires iterative experimentation with parameter��
]]>The process of building an AI-powered solution from start to finish can be daunting. First, datasets must be curated and pre-processed. Next, models need to be trained and tested for inference performance, and then finally deployed into a usable, customer-facing application. At each step along the way, developers are constantly face time-consuming challenges, such as building efficient��
]]>