Pretrained AI Models
Accelerate AI development with world-class customizable pretrained models from NVIDIA.
Streamline AI Application Development
NVIDIA pretrained AI models are a collection of 600+ highly accurate models built by NVIDIA researchers and engineers using representative public and proprietary datasets for domain-specific tasks. The models enable developers to build AI applications efficiently and expeditiously.
These models are optimized for GPUs, cloud, embedded, and edge, delivering high performance for preferred production environments.
A Model for Every Task
Get started today with highly accurate models that span diverse use cases and domains, including computer vision, speech, language understanding, molecule generation, and more, and that can be customized for specific tasks.
NVIDIA is at the forefront of generative AI research, launching groundbreaking models like StyleGAN, GauGAN, eDiff-I, and many more. These generative models are pretrained for efficient enterprise application development.
StyleGAN3 is a cutting-edge generative model for high-quality image synthesis and enables generation of photorealistic training data. It offers unparalleled control over image style and content, making it ideal for creative and enterprise applications.
EG3D, short for Efficient Geometry-Aware 3D, is a generative adversarial network-based pretrained model that produces high-quality 3D geometry in complex environments with improved computational efficiency. EG3D is a powerful tool for developers seeking to generate multi-view-consistent images in real time and 3D geometry for creative AI applications.
Megatron 530B LLM
The Megatron-Turing NLG-530B model is a generative language model developed by NVIDIA that utilizes DeepSpeed and Megatron to train the largest and most powerful model of its kind. It has over 530 billion parameters, making it capable of generating high-quality text for a variety of tasks such as translation, question-answering, and summarization.
Language models are revolutionary and facilitate application development for natural language downstream tasks like text generation, summarization, chatbots, question answering, translation, and more. These models use innovative architectures and frameworks to achieve high accuracy across a wide range of complexities in each task type.
Megatron-LM Based Models
Built upon Megatron architecture developed by the Applied Deep Learning Research team at NVIDIA, this is a series of language models trained in the style of GPT, BERT, and T5. These models deliver improved performance for downstream tasks like question answering and summarization and also excel at complex tasks like generating fluent, consistent, and coherent stories, with controlled pretraining.
BERT-based models are pretrained on massive amounts of text data using the bidirectional transformer architecture, which allows them to capture context from both left and right directions. This pretraining enables these models to perform well on various natural language processing tasks such as sentiment analysis and sentence prediction, without requiring task-specific architecture modifications or extensive fine-tuning.
This is a transformer-based language model that leverages a novel ELECTRA pretraining method to achieve state-of-the-art performance on a range of natural language processing tasks. Smaller model size and faster training time than comparable models make it ideal for a range of applications, including chatbots, virtual assistants, and more.
Computer Vision Models
With computer vision devices can understand the world through images and videos. Computer vision models can be used for image classification, object detection and tracking, object recognition, semantic segmentation, and instance segmentation.
PeopleNet is a computer vision model developed using NVIDIA TAO for real-time pedestrian detection and tracking in urban environments, with high accuracy and low latency. It divides the image into a grid and predicts the location of objects in the picture based on this grid. Its high performance and optimization makes it ideal for use in smart cities, autonomous vehicles, and intelligent video analytics systems.
Bi3D Proximity Segmentation
Bi3D is a binary depth classification network used to classify the depth of objects at a given distance. The idea behind Bi3D is that it is faster and easier to classify an object as being closer or farther than a certain distance, rather than to regress its actual distance accurately. This is an ideal model for building collision avoidance applications, similar to those used in current industrial autonomous mobile robot (AMR) systems.
Based on NVIDIA research's SegFormer paper, these fine-tuned models use a transformer encoder with a segmentation decoder to accurately identify and segment objects of interest in images or videos for semantic segmentation tasks. They are suitable for applications that require precise visual understanding, such as autonomous driving, medical imaging analysis, and surveillance.
With computer vision Speech deals with recognizing and transcribing audio into text or synthesizing speech from text. It includes speech synthesis and automatic speech recognition (ASR).
Conformer stands for convolution-augmented transformer and is used for ASR tasks. These models are based on a combination of transformer and CNN architecture and achieve high accuracies on speech benchmarks. The pretrained models span 10+ languages like German, Italian, Japanese, Kinyarwanda, and more, making it ideal for customized speech applications for live captioning, digital human services, voice assistance, and more.
Time Delay Neural Network Model
Highly accurate pretrained model for speaker identification and verification, ECAPA TDNN is a time delay neural network-based model. It provides robust speaker embeddings under both close-talking and distant-talking conditions to identify the speaker based on how the speech is spoken. This model is used for speaker diarization in speech AI applications for scenarios like understanding medical conversations, video captioning, and many more.
FastPitch and HiFiGAN
The combination of FastPitch and HiFiGAN delivers end-to-end speech synthesis, where the FastPitch model produces a mel spectrogram from raw text, and HiFiGAN can generate audio from a mel spectrogram. Collectively, these pretrained models are ideal for a wide range of text-to-speech (TTS) applications such as audiobooks, voice cloning, and music generation.
Pharma companies, biotech startups, and pioneering biology researchers are developing AI applications for medical imaging, drug discovery, predicting biomolecular data, generating new molecules, and expanding the horizons of healthcare innovation using SOTA pretrained models.
MegaMolBART is a model that understands chemistry and can be used for a variety of cheminformatics applications in drug discovery. This model is ideal for reaction prediction, molecular optimization, and de novo molecular generation.
BioBERT is a pretrained language model designed for biomedical text mining and natural language processing tasks. Based on the popular BERT architecture, it is fine-tuned on high-quality biomedical datasets, allowing for accurate identification of chemical and protein entities in text. The model can be used for various applications in the biomedical field and clinical research around chemical-protein interactions.
Recommender systems predict the "rating" or "preference" a user would give to an item. These models are used on ecommerce and retail for personalized merchandising, media and entertainment for personalized content and in personalizing banking and services.
The Deep Learning Recommendation Model (DLRM) is designed to make use of both categorical and numerical inputs. The model is designed from two primary perspectives—recommendation systems and predictive analytics—to deliver accurate results for advertisements, ad-click through rates, ad ranking, and personalization.
The Search-based Interest Model (SIM) is a system that predicts user behavior based on sequences of previous interactions. The original model has a cascaded two-stage search mechanism that enhances SIM's ability to model lifelong sequential behavior data in both scalability and accuracy. The model is further pretrained to improve predictions, enabling development of effective real-time recommenders and advertising systems.
Pretrained Models from the NGC Catalog
With production-ready, AI pretrained models from the NGC? catalog, With production-ready, AI pretrained models from the.
Transparent Model Resumes
Just like a resume provides a snapshot of a candidate's skills and experience, model credentials do the same for a model. Many pretrained models include critical parameters such as batch size, training epochs, and accuracy, providing you with the necessary transparency and confidence to pick the right model for your use case.
Customize and Adapt Models Faster With NVIDIA SDKs
NVIDIA Train, Adapt, and Optimize (TAO) is an AI-model-adaptation platform that simplifies and accelerates the creation of production-ready models for AI applications. By fine-tuning pretrained models with custom data, developers can produce highly accurate computer vision and language understanding models in hours rather than months, eliminating the need for large training runs and deep AI expertise.
NVIDIA NeMo? is an open-source framework for developers to build and train state-of-the-art conversational AI models.
Supercharge Your Production AI
NVIDIA AI Enterprise, an end-to-end, secure, cloud-native suite of AI software, includes access to unencrypted NVIDIA pretrained models and the model weights for a wide range of use cases. Developers can view the weights and biases of the model, which can help in model explainability and understand model bias. In addition, unencrypted models are easier to debug and integrate into custom AI apps.
Enterprise support is included with NVIDIA AI Enterprise to ensure business continuity and AI projects stay on track.
Accelerate your AI development with pretrained models from the NGC catalog.