What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for: The benefits are? great, but training and deploying large models can be computationally expensive and resource-intensive. Computationally efficient, cost-effective, and energy-efficient systems, architected to deliver real-time��
]]>NVIDIA Jetson Orin is the best-in-class embedded platform for AI workloads. One of the key components of the Orin platform is the second-generation Deep Learning Accelerator (DLA), the dedicated deep learning inference engine that offers one-third of the AI compute on the AGX Orin platforms. This post is a deep technical dive into how embedded developers working with Orin platforms can��
]]>For HPC clusters purposely built for AI training, such as the NVIDIA DGX BasePOD and NVIDIA DGX SuperPOD, fine-tuning the cluster is critical to increasing and optimizing the overall performance of the cluster. This includes fine-tuning the overall performance of the management fabric (based on Ethernet), storage fabric (Ethernet or InfiniBand), and the compute fabric (Ethernet or InfiniBand).
]]>Machine learning has the promise to improve our world, and in many ways it already has. However, research and lived experiences continue to show this technology has risks. Capabilities that used to be restricted to science fiction and academia are increasingly available to the public. The responsible use and development of AI requires categorizing, assessing, and mitigating enumerated risks where��
]]>Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep learning (DL), many Spark practitioners have sought to add DL models to their data processing pipelines across a variety of use cases like sales predictions, content recommendations, sentiment analysis, and fraud detection. Yet��
]]>Data labeling and model training are consistently ranked as the most significant challenges teams face when building an AI/ML infrastructure. Both are essential steps in the ML application development process, and if not done correctly, they can lead to inaccurate results and decreased performance. See the AI Infrastructure Ecosystem of 2022 report from the AI Infrastructure Alliance for more��
]]>Reconstructing a smooth surface from a point cloud is a fundamental step in creating digital twins of real-world objects and scenes. Algorithms for surface reconstruction appear in various applications, such as industrial simulation, video game development, architectural design, medical imaging, and robotics. Neural Kernel Surface Reconstruction (NKSR) is the new NVIDIA algorithm for��
]]>Today��s machine learning (ML) solutions are complex and rarely use just a single model. Training models effectively requires large, diverse datasets that may require multiple models to predict effectively. Also, deploying complex multi-model ML solutions in production can be a challenging task. A common example is when compatibility issues with different frameworks can lead to delayed insights.
]]>NVIDIA PhysicsNeMo is a framework for building, training, and fine-tuning deep learning models for physical systems, otherwise known as physics-informed machine learning (physics-ML) models. PhysicsNeMo is available as OSS (Apache 2.0 license) to support the growing physics-ML community. The latest PhysicsNeMo software update, version 23.05, brings together new capabilities��
]]>When you see a context-relevant advertisement on a web page, it��s most likely content served by a Taboola data pipeline. As the leading content recommendation company in the world, a big challenge for Taboola was the frequent need to scale Apache Spark CPU cluster capacity to address the constantly growing compute and storage requirements. Data center capacity and hardware costs are always��
]]>Wireless technology has evolved rapidly and the 5G deployments have made good progress around the world. Up until recently, wireless RAN was deployed using closed-box appliance solutions by traditional RAN vendors. This closed-box approach is not scalable, underuses the infrastructure, and does not deliver optimal RAN TCO. It has many shortcomings. We have come to realize that such closed-box��
]]>The telecom sector is transforming how communication happens. Striving to provide reliable, uninterrupted service, businesses are tackling the challenge of delivering an optimal customer experience. This optimal customer experience is something many long-time customers of large telecom service providers do not have. Take Jack, for example. His call was on hold for 10 minutes��
]]>The need for a high-fidelity multi-robot simulation environment is growing rapidly as more and more autonomous robots are being deployed in real-world scenarios. In this post, I will review what we used in the past at Cogniteam for simulating multiple robots, our current progress with NVIDIA Isaac Sim, and how Nimbus can speed up the development and maintenance of a multi-robot simulation with��
]]>Embedded edge AI is transforming industrial environments by introducing intelligence and real-time processing to even the most challenging settings. Edge AI is increasingly being used in agriculture, construction, energy, aerospace, satellites, the public sector, and more. With the NVIDIA Jetson edge AI and robotics platform, you can deploy AI and compute for sensor fusion in these complex��
]]>AI and its newest subdomain generative AI are dramatically accelerating the pace of change in scientific computing research. From pharmaceuticals and materials science to astronomy, this game-changing technology is opening up new possibilities and driving progress at an unprecedented rate. In this post, we explore some new and exciting applications of generative AI in science��
]]>Hardware-in-the-loop (HIL) testing is a powerful tool used to validate and verify the performance of complex systems, including robotics and computer vision. This post explores how HIL testing is being used in these fields with the NVIDIA Isaac platform. The NVIDIA Isaac platform consists of NVIDIA Isaac Sim, a simulator that provides a simulated environment for testing robotics algorithms��
]]>Data centers are an essential part of a modern enterprise, but they come with a hefty energy cost. To complicate matters, energy costs are rising and the need for data centers continues to expand, with a market size projected to grow 25% from 2023 to 2030. Globally, energy costs are already negatively affecting data centers and high-performance computing (HPC) systems. To alleviate the energy��
]]>In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US equity trading volume, according to the paper High-Frequency Trading Synchronizes Prices in Financial Markets. Market makers are the big players on the sell side who provide liquidity in the market. Speculators are on the buy side��
]]>The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of floating-point computations during inference. Research has shown that many of those computations can be skipped by forcing some weights to be zero, with little impact on the final accuracy. In parallel to that, previous posts have shown that��
]]>Accurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and non-forecasted weather cost an estimated $714 billion in 2022 alone. To avoid this, companies need faster, cheaper, and more accurate weather models. In a recent GTC session, Microsoft, and TempoQuest detailed their work with NVIDIA to address��
]]>Edge AI applications, whether in airports, cars, military operations, or hospitals, rely on high-powered sensor streaming applications that enable real-time processing and decision-making. With its latest v0.5 release, the NVIDIA Holoscan SDK is ushering in a new wave of sensor-processing capabilities for the next generation of AI applications at the edge. This release also coincides with the��
]]>Linear regression is a powerful statistical tool used to model the relationship between a dependent variable and one or more independent variables (features). An important, and often forgotten, concept in regression analysis is that of interaction terms. In short, interaction terms enable you to examine whether the relationship between the target and the independent variable changes depending on��
]]>Transformers are one of the most influential AI model architectures today and are shaping the direction of future AI R&D. First invented as a tool for natural language processing (NLP), transformers are now used in almost every AI task, including computer vision, automatic speech recognition, molecular structure classification, and financial data processing. In Korea��
]]>This post is part of a series about optimizing end-to-end AI. While NVIDIA hardware can process the individual operations that constitute a neural network incredibly fast, it is important to ensure that you are using the tools correctly. Using the respective tools such as ONNX Runtime or TensorRT out of the box with ONNX usually gives you good performance, but why settle for good performance��
]]>Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill sets and expertise to reveal the underlying problem. Yet projects often require using multiple languages, to ensure high performance where necessary, a user-friendly experience, and compatibility where possible. Unfortunately��
]]>As a data scientist, evaluating machine learning model performance is a crucial aspect of your work. To do so effectively, you have a wide range of statistical metrics at your disposal, each with its own unique strengths and weaknesses. By developing a solid understanding of these metrics, you are not only better equipped to choose the best one for optimizing your model but also to explain your��
]]>According to the American Society of Quality (ASQ), defects cost manufacturers nearly 20% of overall sales revenue. The products that we interact with on a daily basis��like phones, cars, televisions, and computers��must be manufactured with precision so that they can deliver value in varying conditions and scenarios. AI-based computer vision applications are helping to catch defects in the��
]]>The incredible advances of accelerated computing are powered by data. The role of data in accelerating AI workloads is crucial for businesses looking to stay ahead of the curve in the current fast-paced digital environment. Speeding up access to that data is yet another way that NVIDIA accelerates entire AI workflows. NVIDIA DGX Cloud caters to a wide variety of market use cases.
]]>Robots are increasing in complexity, with a higher degree of autonomy, a greater number and diversity of sensors, and more sensor fusion-based algorithms. Hardware acceleration is essential to run these increasingly complex workloads, enabling robotics applications that can run larger workloads with more speed and power efficiency. The mission of NVIDIA Isaac ROS has always been to empower��
]]>MoveIt is a robotic manipulation platform that incorporates the latest advances in motion planning, manipulation, 3D perception, kinematics, control, and navigation. PickNik Robotics, the company leading the development of MoveIt, is exploring the use of NVIDIA Isaac Sim in an internal R&D project. The project goals are to improve perception for manipulation and augment with MoveIt Studio, PickNik����
]]>GPUs continue to get faster with each new generation, and it is often the case that each activity on the GPU (such as a kernel or memory copy) completes very quickly. In the past, each activity had to be separately scheduled (launched) by the CPU, and associated overheads could accumulate to become a performance bottleneck. The CUDA Graphs facility addresses this problem by enabling multiple GPU��
]]>