Modern high-performance computing (HPC) is enabling more than just quick calculations �� it��s powering AI systems that are unlocking scientific breakthroughs. HPC has gone through many iterations, each sparked by a creative repurposing of technologies. For example, early supercomputers used off-the-shelf components. Researchers later built powerful clusters from personal computers and even��
]]>In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training workflows and analyzed bottlenecks using NVIDIA Nsight Systems. We also discussed how the NVIDIA GH200 Grace Hopper Superchip enables efficient training processes. While profiling helps identify inefficiencies��
]]>The rapid advancements in AI have resulted in an era of exponential growth in model sizes, particularly in the domain of large language models (LLMs). These models, with their transformative capabilities, are driving innovation across industries. However, the increasing complexity and computational demands of training such models necessitate a meticulous approach to optimization and profiling.
]]>The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The benefits of NVIDIA Grace include high-performance Arm Neoverse V2 cores, fast NVIDIA-designed Scalable Coherency Fabric, and low-power high-bandwidth LPDDR5X memory. These features make the Grace CPU ideal for data processing with��
]]>NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in data centers and other environments and supports single-GPU, multi-GPU and multi-node (MGMN) configurations. cuDSS has become a key tool for accelerating computer-aided engineering (CAE) workflows and scientific computations across��
]]>Supercomputers are the engines of groundbreaking discoveries. From predicting extreme weather to advancing disease research and designing safer, more efficient infrastructures, these machines simulate complex systems that are impractical to test in the real world due to their size, cost, and material requirements. Since the introduction of the GPU in 1999, NVIDIA has continually pushed the��
]]>NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined approach for building flexible and cost-effective accelerated infrastructure while ensuring compatibility and interoperability. The latest Enterprise RA details an optimized cluster configuration for systems integrated with NVIDIA GH200��
]]>The NVIDIA Grace CPU is transforming data center design by offering a new level of power-efficient performance. Built specifically for data center scale, the Grace CPU is designed to handle demanding workloads while consuming less power. NVIDIA believes in the benefit of leveraging GPUs to accelerate every workload. However, not all workloads are accelerated. This is especially true for those��
]]>Powered by the new GB10 Grace Blackwell Superchip, Project DIGITS can tackle large generative AI models of up to 200B parameters.
]]>2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to empowering open-source contributions, these blog posts highlight the breakthroughs that resonated most with our readers. NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale Introduced in��
]]>Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. Delivering these advancements requires full-stack innovation at data-center scale, spanning chips, systems, networking, software, and algorithms. Choosing the right architecture for the right workload with the best energy efficiency is critical to maximizing the performance and��
]]>As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance, delivered at data center scale, is required. The NVIDIA Blackwell platform, launched at GTC 2024 and now in full production, integrates seven types of chips: GPU, CPU, DPU, NVLink Switch chip, InfiniBand Switch, and Ethernet Switch.
]]>NVIDIA designed the NVIDIA Grace CPU to be a new kind of high-performance, data center CPU��one built to deliver breakthrough energy efficiency and optimized for performance at data center scale. Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. To deliver these speedups, full-stack innovation at data center scale is��
]]>Inferencing for generative AI and AI agents will drive the need for AI compute infrastructure to be distributed from edge to central clouds. IDC predicts that ��Business AI (consumer excluded) will contribute $19.9 trillion to the global economy and account for 3.5% of GDP by 2030.�� 5G networks must also evolve to serve this new incoming AI traffic. At the same time, there is an opportunity��
]]>In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding performance across the board. Among the many submissions made using the NVIDIA platform were results using the NVIDIA GH200 Grace Hopper Superchip. GH200 tightly couples an NVIDIA Grace CPU with an NVIDIA Hopper GPU using NVIDIA NVLink-C2C��
]]>Reservoir simulation helps reservoir engineers optimize their resource exploration approach by simulating complex scenarios and comparing with real-world field data. This extends to simulation of depleted reservoirs that could be repurposed for carbon storage from operations. Reservoir simulation is crucial for energy companies aiming to enhance operational efficiency in exploration and production.
]]>With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise greater return on investment without impacting current operations. This is leading IT decision makers to reassess past infrastructure decisions and explore strategies to consolidate traditional workloads into fewer��
]]>The exponential growth in data processing demand is projected to reach 175 zettabytes by 2025. This contrasts sharply with the slowing pace of CPU performance improvements. For more than a decade, semiconductor advancements have not kept up with the pace predicted by Moore��s Law, leading to a pressing need for more efficient computing solutions. NVIDIA GPUs have emerged as the most efficient��
]]>Mathematical optimization is a powerful tool that enables businesses and people to make smarter decisions and reach any number of goals��from improving operational efficiency to reducing costs to increasing customer satisfaction. Many of these are everyday use cases, such as scheduling a flight, pricing a hotel room, choosing a GPS route, routing delivery trucks, and more. However��
]]>AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements of these new AI workloads, HPC is scaling up at a rapid pace. To enable applications to scale to multi-GPU and multi-node platforms, HPC tools and libraries must support that growth. NVIDIA provides a comprehensive ecosystem of��
]]>What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for: The benefits are? great, but training and deploying large models can be computationally expensive and resource-intensive. Computationally efficient, cost-effective, and energy-efficient systems, architected to deliver real-time��
]]>As we approach the end of another exciting year at NVIDIA, it��s time to look back at the most popular stories from the NVIDIA Technical Blog in 2023. Groundbreaking research and developments in fields such as generative AI, large language models (LLMs), high-performance computing (HPC), and robotics are leading the way in transformative AI solutions and capturing the interest of our readers.
]]>At AWS re:Invent 2023, AWS and NVIDIA announced that AWS will be the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips interconnected with NVIDIA NVLink technology through NVIDIA DGX Cloud and running on Amazon Elastic Compute Cloud (Amazon EC2). This is a game-changing technology for cloud computing. The NVIDIA GH200 NVL32, a rack-scale solution within NVIDIA DGX Cloud or an��
]]>High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern data center, HPC synergizes with AI, harnessing data in transformative new ways. The performance and throughput demands of next-generation HPC applications call for an accelerated computing platform that can handle diverse workloads��
]]>The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most notably, the bidirectional, high-bandwidth, and cache-coherent connection between CPU and GPU memory means that the user can develop their application for both processors while using a single, unified address space.
]]>The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this release, version 12.3, include: CUDA and the CUDA Toolkit continue to provide the foundation for all accelerated computing applications in data science, machine learning and deep learning, generative AI with LLMs for both training and��
]]>NVIDIA, working with Fujitsu and Wind River, has enabled NTT DOCOMO to launch the first GPU-accelerated commercial Open RAN 5G service in its network in Japan. This makes it the first-ever telco in the world to deploy a GPU-accelerated commercial 5G network. The announcement is a major milestone as the telecom industry strives to address the multi-billion-dollar problem of driving��
]]>NVIDIA is driving fast-paced innovation in 5G software and hardware across the ecosystem with its OpenRAN-compatible 5G portfolio. Accelerated computing hardware and NVIDIA Aerial 5G software are delivering solutions for key industry stakeholders such as telcos, cloud service providers (CSPs), enterprises, and academic researchers. TMC recently named the NVIDIA MGX with NVIDIA Grace Hopper��
]]>At COMPUTEX 2023, NVIDIA announced the NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI workloads. In addition to describing critical aspects of the NVIDIA DGX GH200 architecture, this post discusses how NVIDIA Base Command enables rapid deployment, accelerates the onboarding of users, and simplifies system management.
]]>The NVIDIA Grace CPU is the first data center CPU developed by NVIDIA. Combining NVIDIA expertise with Arm processors, on-chip fabrics, system-on-chip (SoC) design, and resilient high-bandwidth low-power memory technologies, the Grace CPU was built from the ground up to create the world��s first superchip for computing. At the heart of the superchip, lies the NVLink Chip-2-Chip (C2C).
]]>The NVIDIA Arm HPC Developer Kit is an integrated hardware and software platform for creating, evaluating, and benchmarking HPC, AI, and scientific computing applications on a heterogeneous GPU- and CPU-accelerated computing system. NVIDIA announced its availability in March of 2021. The kit is designed as a stepping stone to the next-generation NVIDIA Grace Hopper Superchip for HPC and AI��
]]>The NVIDIA Grace Hopper Superchip Architecture is the first true heterogeneous accelerated platform for high-performance computing (HPC) and AI workloads. It accelerates applications with the strengths of both GPUs and CPUs while providing the simplest and most productive distributed heterogeneous programming model to date. Scientists and engineers can focus on solving the world��s most important��
]]>NVIDIA Grace CPU is the first data center CPU developed by NVIDIA. It has been built from the ground up to create the world��s first superchips. Designed to deliver excellent performance and energy efficiency to meet the demands of modern data center workloads powering digital twins, cloud gaming and graphics, AI, and high-performance computing (HPC), NVIDIA Grace CPU features 72 Armv9 CPU��
]]>High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling climate change, or creating accurate simulations of our world, these solutions demand an enormous��and rapidly growing��amount of processing power. They are increasingly out of reach of traditional computing approaches.
]]>Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU architecture. This post gives you a look inside the new H100 GPU and describes important new features of NVIDIA Hopper architecture GPUs. The NVIDIA H100 Tensor Core GPU is our ninth-generation data center GPU designed to deliver an��
]]>