Reinforcement Learning – NVIDIA Technical Blog

Reinforcement Learning – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Nathan Horrocks <![CDATA[NVIDIA Showcases the Future of Intelligent Robots at CoRL 2024]]> http://www.open-lab.net/blog/?p=90907 2024-10-31T18:37:06Z 2024-10-25T15:00:00Z

From humanoids to policy, explore the work NVIDIA is bringing to the robotics community.]]>

From humanoids to policy, explore the work NVIDIA is bringing to the robotics community.

isaac-robotics-blog-3249864-1920x1080

From humanoids to policy, explore the work NVIDIA is bringing to the robotics community.

]]> 0 Amulya Vishwanath <![CDATA[Fast-Track Robot Learning in Simulation Using NVIDIA Isaac Lab]]> http://www.open-lab.net/blog/?p=86103 2025-06-17T19:17:26Z 2024-07-29T20:30:00Z

Originally published on July 29, 2024, this post was updated on October 8, 2024. Robots need to be adaptable, readily learning new skills and adjusting to their...]]>

Originally published on July 29, 2024, this post was updated on October 8, 2024. Robots need to be adaptable, readily learning new skills and adjusting to their...

robot-in-kitchen-nine-scenes

Originally published on July 29, 2024, this post was updated on October 8, 2024. Robots need to be adaptable, readily learning new skills and adjusting to their surroundings. Yet traditional training methods can limit a robot��s ability to apply learned skills in new situations. This is often due to the gap between perception and action, as well as the challenges in transferring skills across��

]]> 0 Rajarshi Roy <![CDATA[Designing Arithmetic Circuits with Deep Reinforcement Learning]]> http://www.open-lab.net/blog/?p=49611 2022-07-11T20:49:10Z 2022-07-08T14:38:33Z

As Moore��s law slows down, it becomes increasingly important to develop other techniques that improve the performance of a chip at the same technology process...]]>

As Moore��s law slows down, it becomes increasingly important to develop other techniques that improve the performance of a chip at the same technology process...

NVIDIA researchers use AI to design better arithmetic circuits that power our AI chips.

As Moore��s law slows down, it becomes increasingly important to develop other techniques that improve the performance of a chip at the same technology process node. Our approach uses AI to design smaller, faster, and more efficient circuits to deliver more performance with each chip generation. Vast arrays of arithmetic circuits have powered NVIDIA GPUs to achieve unprecedented acceleration��

]]> 0 Nathan Horrocks <![CDATA[Building Generally Capable AI Agents with MineDojo]]> http://www.open-lab.net/blog/?p=49673 2023-06-12T09:24:08Z 2022-07-01T13:00:00Z

Using video games as a medium for training AI has become a popular method within the AI research community. These autonomous agents have had great success in...]]>

Using video games as a medium for training AI has become a popular method within the AI research community. These autonomous agents have had great success in... A large compilation of Minecraft videos that MineDojo uses to train the AI

A large compilation of Minecraft videos that MineDojo uses to train the AI

Using video games as a medium for training AI has become a popular method within the AI research community. These autonomous agents have had great success in Atari games, Starcraft, Dota, and Go. But while these advancements have been popular for AI research, the agents do not generalize beyond a very specific set of tasks, unlike humans that continuously learn from open-ended tasks.

]]> 0 Ashraf Eassa <![CDATA[The Full Stack Optimization Powering NVIDIA MLPerf Training v2.0 Performance]]> http://www.open-lab.net/blog/?p=49597 2023-07-05T19:27:00Z 2022-06-30T18:00:00Z

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and...]]>

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and...

Boosting MLPerf Training Performance with Full-Stack Optimization

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and useful measures of deep learning performance. MLPerf training focuses on measuring time to train a range of commonly used neural networks for the following tasks: Lower training times are important to speed time to deployment��

]]> 0 Varun Lodaya <![CDATA[NVIDIA Research: Transferring Dexterous Manipulation from GPU Simulation to a Remote, Real-World, TriFinger Task]]> http://www.open-lab.net/blog/?p=37732 2023-07-11T22:58:37Z 2021-09-22T17:00:00Z

A critical question to ask when designing a machine learning�Cbased solution is, ��What��s the resource cost of developing this solution?�� There are...]]>

A critical question to ask when designing a machine learning�Cbased solution is, ��What��s the resource cost of developing this solution?�� There are...

large-scale-GPU-simulation-robot-learning

A critical question to ask when designing a machine learning�Cbased solution is, ��What��s the resource cost of developing this solution?�� There are typically many factors that go into an answer: time, developer skill, and computing resources. It��s rare that a researcher can maximize all these aspects, so optimizing the solution development process is critical. This problem is further aggravated in��

]]> 1 Arash Vahdat <![CDATA[Discovering GPU-friendly Deep Neural Networks with Unified Neural Architecture Search]]> http://www.open-lab.net/blog/?p=21847 2022-08-21T23:40:45Z 2020-11-05T21:29:02Z

After the first successes of deep learning, designing neural network architectures with desirable performance criteria for a given task (for example, high...]]>

After the first successes of deep learning, designing neural network architectures with desirable performance criteria for a given task (for example, high...

unas-overview

After the first successes of deep learning, designing neural network architectures with desirable performance criteria for a given task (for example, high accuracy or low latency) has been a challenging problem. Some call it alchemy and some intuition, but the task of discovering a novel architecture often involves a tedious and costly trial-and-error process of searching in an exponentially large��

]]> 0 Samuele Tosatto <![CDATA[Enhancing Sample Efficiency in Reinforcement Learning with Nonparametric Methods]]> http://www.open-lab.net/blog/?p=19948 2022-08-21T23:40:36Z 2020-09-01T18:50:29Z

Recent developments in artificial intelligence and autonomous learning have shown impressive results in tasks like board games and computer games. However, the...]]>

Recent developments in artificial intelligence and autonomous learning have shown impressive results in tasks like board games and computer games. However, the...

gradient-ascent

Recent developments in artificial intelligence and autonomous learning have shown impressive results in tasks like board games and computer games. However, the applicability of learning techniques remains mainly limited to simulated environments. One of the major causes of this inapplicability to real-world scenarios is the general sample-inefficiency and inability to guarantee the safe��

]]> 1 Yan Cheng <![CDATA[Powering AutoML-enabled AI Model Training with Clara Train]]> http://www.open-lab.net/blog/?p=17073 2022-08-21T23:39:57Z 2020-04-15T21:44:00Z

Deep neural networks (DNNs) have been successfully applied to volume segmentation and other medical imaging tasks. They are capable of achieving...]]>

Deep neural networks (DNNs) have been successfully applied to volume segmentation and other medical imaging tasks. They are capable of achieving...

automl-controller-network-training-flow

Deep neural networks (DNNs) have been successfully applied to volume segmentation and other medical imaging tasks. They are capable of achieving state-of-the-art accuracy and can augment the medical imaging workflow with AI-powered insights. However, training robust AI models for medical imaging analysis is time-consuming and tedious and requires iterative experimentation with parameter��

]]> 0 Tim Dettmers <![CDATA[Deep Learning in a Nutshell: Reinforcement Learning]]> http://www.open-lab.net/blog/parallelforall/?p=7124 2022-08-21T23:37:57Z 2016-09-08T10:45:04Z

This post is Part 4 of the Deep Learning in a Nutshell series, in which I��ll dive into reinforcement learning, a type of machine learning in which agents take...]]>

This post is Part 4 of the Deep Learning in a Nutshell series, in which I��ll dive into reinforcement learning, a type of machine learning in which agents take...

Figure 1: Value iteration constructs the value function over all states over time. Here each square is a state: S is the start state, G the goal state, T squares are traps, and black squares cannot be entered. In value iteration we initialize the rewards (traps and goal state) and then these reward values spread over time until an equilibrium is reached. Depending on the penalty value on traps and the reward value for the goal different solution patterns might emerge; the last two grids show such solution states.

This post is Part 4 of the Deep Learning in a Nutshell series, in which I��ll dive into reinforcement learning, a type of machine learning in which agents take actions in an environment aimed at maximizing their cumulative reward. Deep Learning in a Nutshell posts offer a high-level overview of essential concepts in deep learning. The posts aim to provide an understanding of each concept rather��

]]> 4 Mark Harris <![CDATA[Train Your Reinforcement Learning Agents at the OpenAI Gym]]> http://www.open-lab.net/blog/parallelforall/?p=6628 2022-08-21T23:37:51Z 2016-04-27T17:00:51Z

Today OpenAI, a non-profit artificial intelligence research company, launched OpenAI Gym,?a toolkit for developing and comparing?reinforcement...]]>

Today OpenAI, a non-profit artificial intelligence research company, launched OpenAI Gym,?a toolkit for developing and comparing?reinforcement...

Today OpenAI, a non-profit artificial intelligence research company, launched OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Go. OpenAI researcher John Schulman shared some details about his organization, and how OpenAI Gym will make it easier for AI researchers to design��

]]> 4 ��˳��97caoporen��