Computer Vision

Empower your devices to perceive and understand the world around us with software that’s masterful, scalable, and tested.

NVIDIA? software enables the end-to-end computer vision (CV) workflow—from model development to deployment—for individual developers, higher education and research, and enterprises.

Computer vision is a field of technology that enables devices like smart cameras to acquire, process, analyze, and interpret images and videos. It can be understood across domains based on type and technique.

Traditional computer vision, also referred to as non-deep learning-based computer vision or image processing, performs a specific task based on hard-coded instructions. For instance, image processing might be used to mirror an image or reduce noise in a video. AI-based computer vision, or vision AI, relies on algorithms that have been trained on visual data to accomplish a specific task. For example, the driver assistance system on an autonomous vehicle designed with CV algorithms uses cameras and other sensors to not only display, but to perceive what’s in front of and behind it to identify and classify into regions or points of interest within an image frame. In this case, computer vision has a safety application—helping the vehicle operator to navigate around road debris, other vehicles, animals, and people. Similarly, farmers might rely on CV-enabled devices to automatically identify weeds and where crops are growing well over a large field to increase yield. CV tasks like these are based on artificial intelligence and, more specifically, deep learning, a type of machine learning patterned after the brain.

Regardless of type, computer vision models let devices perform tasks in real-time that mimic human-like vision capabilities.

Computer vision techniques

Most techniques begin with a model–or a mathematical algorithm–that’s been trained with volumes of data to accomplish a specific task. Some of the common techniques include:

Computer Vision classification


Classification involves identifying what object is in an image or video frame. Classification models are usually trained with a large dataset to identify simple objects like dogs, cats, chairs, or very specific ones like the type of vehicles in a road scene. The quality of the classification output depends on the training data used. The more the quantity and diversity of the training data, the higher the degree of precision.

Computer Vision detection


Detection involves locating and localizing an object or multiple objects within an image or a video frame. The algorithm outputs a rectangular bounding box around the detected object to indicate its location in the image. Object detectors may be trained to detect cars, road signs, people, or other objects of interest within an image or a video frame.

Computer Vision segmentation


Segmentation involves locating objects or regions of interest precisely in an image by assigning a label to every pixel in an image. This way, pixels with the same label share similar characteristics, such as color, or texture. Segmentation models are very commonly used in medical imaging for performing tasks like automatically detecting tumors in Magnetic Resonance Imaging (MRI) scans.

Image Synthesis

Neural Radiance Fields (NeRF)

NeRF involves creating three dimensional (3D) content from inferring between two or more two dimensional (2D) image inputs. It creates novel views and 3D scenes based on inferring from a set of images. NeRF networks, like Generative Adversarial Networks (GANs), can be used to generate synthetic data.

The artificial intelligence-based computer vision workflow

The computer vision workflow is highly dependent on the task, model, and data. A typical, simplified Artificial Intelligence (AI)-based end-to-end CV workflow involves three (3) key stages—Model and Data Selection, Training and Testing/Evaluation, and Deployment and Execution.

Let’s look at these stages using the CV detection technique to identify a dog (classification and segmentation-based techniques would follow an identical workflow).

Finding Fido: Developing an AI-based object-detection cv workflow

Challenge: You want to build software for a monitoring system that automatically detects when your dog arrives at or leaves through the backdoor.

Three-stage solution

Model and Data Selection

Select an object-detection model. Collect photos of your dog (let's call him Fido) that you can use to train and fine-tune your model to recognize him.

Training and Testing/Evaluation

Train and test your model using different photos of Fido to affirm the model's accuracy in detecting him.

Deployment and Execution

Deploy the trained model to hardware to monitor and detect the next time Fido leaves the house using an installed camera. Below, a high-level diagram summarizes the AI-based CV solution.

NVIDIA enables every stage of computer vision development.

NVIDIA enables the end-to-end CV workflow, providing not only AI-based pretrained models, but also tools for training and testing/evaluation and software application frameworks for deployment and execution. Learn more below about how NVIDIA enables every stage of CV development.

Get started with pretrained models for computer vision

Developing models for these techniques on your own would require a lot of training data, time, and expertise. Here’s the good news–you don’t have to be an expert to get started. NVIDIA hosts a number of pretrained models, already built and ready-to-use, to start developing your own CV solutions. Start with NGC,--our (GPU) accelerated software hub–to learn about computer vision models and resources, as well as other deep learning-based speech and natural language processing use cases and application frameworks.

Explore pretrained models with NGC catalog

Develop an end-to-end computer vision workflow

Start with synthetic image data and NVIDIA pretrained models to make the end-to-end computer vision AI development process easier.

Omniverse Replicator

Synthetic Data Generation

NVIDIA Omniverse? Replicator

Fine-tune pretrained perception models with custom, physically accurate 3D synthetic visual data generated in minutes or hours rather than months.

Learn how to generate synthetic data
AI Model Adaptation Framework

Framework for Creating Custom Models


Leverage the power of transfer learning to fine-tune pretrained models with your data to produce highly accurate computer vision AI models in hours rather than months.

Learn about the AI model framework
Streaming analytics toolkit

Streaming Analytics Toolkit


Build analytics for AI-based multi-sensor processing, video, audio, and image understanding.

Learn how to build and deploy vision AI
NVIDIA Metropolis

Smart Spaces

NVIDIA Metropolis

NVIDIA Metropolis is an end-to-end application framework that brings visual data, edge computing, and multi-modal AI together for developers creating AI solutions that improve operational efficiency and safety for a broad range of physical processes and spaces.

NVIDIA Metropolis makes it easier and more cost-effective to develop, deploy, and manage AI-vision applications and services across any industry including retail, manufacturing, smart cities, agriculture, and more.

Learn about the smart spaces framework

Explore computer vision software

Learn how to develop applications using NVIDIA's industry-specific software products and platforms.




Develop computer vision models for gesture recognition, heart rate monitoring, mask detection, and body pose estimation in a hospital room to detect falls. Build, manage, and deploy workflows in medical imaging, medical devices with streaming video, and smart hospitals.

Learn about the healthcare application framework



Develop end-to-end (E2E) CV solutions for the autonomous vehicle (AV) and the intelligent cockpit (IX). Collect and generate CV data train DNN models using the E2E simulation platform (DRIVE? Sim?).

Learn about AV development
Video streaming

Video Streaming

Maxine? SDK

Create virtual collaboration and content creation applications with video effects, audio effects, and augmented reality.

Learn how to build video communications
Multimodal conversation

Multimodal Conversation


Develop multimodal conversational AI applications by fusing vision, audio, and other sensor inputs simultaneously.

Learn how to build Conversational AI

Envision next-generation computer vision

Learn about new technologies and innovative research work on computer vision at NVIDIA.


Emerging Innovation

Learn what problems our computer vision research engineers and data scientists have been solving. Read our latest publications.

Learn about NVIDIA’S latest CV development work
Computer Vision Research Engineers


NVIDIA Isaac Sim

Develop, test, train, and manage robots in virtual environments. Use computer vision for manipulation, navigation, and synthetic data generation.

Build simulation for robotics

Explore GPU-accelerated libraries and optimization platform

Learn how NVIDIA’s libraries and optimization platform accelerate computer vision on GPUs.

Data pipeline Accelerator

Open Source Library for GPU-Accelerated Pre- and Post-Processing


Increase throughput of AI-based computer vision and image processing pipelines at lower cloud-computing and energy costs.

Develop AI Computer Vision at
Data pipeline Accelerator

Data Pipeline Accelerator

Data Loading Library (DALI)

Load and process computer vision and audio data using GPUs. Use directly in TensorFlow, PyTorch, MXNet, and PaddlePaddle models.

Learn how to load
data efficiently
Embedded Computer Vision and Image Processing Library

Embedded Computer Vision and Image Processing Library

Vision Programming Interface (VPI)

Implement asynchronous computer vision and image processing applications in real-time.

Learn about accelerated
CV and IP processing

Computer Vision and Image Processing Library for Multidimensional Images


Implement computer vision and image processing operations for n-dimensional data.

Learn about n-dimensional
image processing
3D deep learning research library

3D Deep Learning Research Library


Generate synthetic data. Render and visualize 3D training datasets.

Learn how to visualize
synthetic data
nvJPEG and nvJPEG2000

Image Decoding Libraries

nvJPEG and nvJPEG2000

Accelerate processing of JPEG and JPEG2000 images.

Learn how to accelerate
JPEG image processing
Motion flow generation

Motion Flow Generation

Optical Flow SDK

Recognize, classify, and track objects and actions in a video stream by enhancing flow-vector computation between frames using GPUs.

Learn how to optimize
motion generation
NVIDIA Performance Primitives

Image and Signal-Processing Library

NVIDIA Performance Primitives (NPP)

Deploy ready-to-use, domain-specific, high-performance functions for image, video, and signal processing.

Learn how to deploy
accelerated primitives
Inference optimizer and runtime

Inference Optimizer and Runtime


Enable delivery of low latency and high throughput for inference applications.

Learn about AI Inference

Inference Server

NVIDIA Triton?

Deploy, run, and scale AI models with ease from any framework on GPUs and CPUs.

Learn more about
AI deployment

Your world, powered by computer vision

Get started with frequently asked questions

Computer vision is more than research. It delivers practical, real-world solutions that change lives. NVIDIA’s deep expertise in artificial intelligence and high-performance computing provides endless opportunities to meaningfully impact the world.

Get your CV questions answered
Get started with Frequently Asked Questions
Learn the Fundamentals of Deep Learning

Learn the fundamentals of computer vision

New to computer vision? Learn the Fundamentals of Deep Learning with hands-on exercises for CV in this eight-hour course offered by the Deep Learning Institute. You’ll learn how to train deep learning models from scratch and use pre-trained models, experiment with different model architectures, explore deep learning tools and techniques, and work with datasets to improve model accuracy. You’ll also earn a certification to show your accomplishment.

Learn computer vision

See what’s new in computer vision

Event: CUDA 12.2 YouTube Premiere

On July 6, join experts for a deep dive into CUDA 12.2, including support for confidential computing.

New Video: Composition and Layering with Universal Scene Description

New Video: Composition and Layering with Universal Scene Description

Developers are using Universal Scene Description (OpenUSD) to push the boundaries of 3D workflows. As an ecosystem and interchange paradigm, OpenUSD models, labels, classifies, and combines a wide range of data sources into a composed ground truth. It is also highly extensible with four key features that help developers meet the demands of virtual worlds. … Continued

ICYMI: Exploring Challenges Posed by Biased Datasets Using RAPIDS cuDF

Read about an innovative GPU solution that solves limitations using small biased datasets with RAPIDS cuDF.

Experience real-world computer vision applications

No challenge is too small and no company too big for computer vision. See innovative solutions in action—from startups to global manufacturers.

What challenges are you facing with building computer vision solutions?

We want to hear about your pain points in developing computer vision solutions to see how we can enable you.

Share your computer vision challenges with NVIDIA

We’re partnering for success

Global challenges take a community. We support you in tackling challenges with powerful solutions to meet your exact needs.

BMW logo
Kings College London
Ping An
Quantiphi logo
Touchcast logo
Verizon logo

The World of Computer Vision Solutions is Powered by NVIDIA.

Join us