
As artificial intelligence redefines the computing landscape, the network has become the critical backbone shaping the data center of the future. Large language model training performance is determined not only by compute resources but by the agility, capacity, and intelligence of the underlying network. The industry is witnessing the evolution from traditional, CPU-centric infrastructures toward��
]]>
Continuous integration and continuous delivery/deployment (CI/CD) is a set of modern software development practices used for delivering code changes more reliably and often. While CI/CD is widely adopted in the software world, it��s becoming more relevant for network engineers, particularly as networks become automated and software-driven. In this post, I briefly introduce CI/
]]>
At its core, NVIDIA Air is built for automation. Every part of your network can be coded, versioned, and set to trigger automatically. This includes creating the topology, configuring the network, and validating its setup. Automation reduces manual error, speeds up testing, and brings the same rigor to networking that modern DevOps teams apply to software development. Let��s discuss the basic��
]]>
Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents following months of conversation, legal assistants reasoning through gigabytes of case law as big as an entire encyclopedia set, or coding copilots navigating sprawling repositories, preserving long-range context is essential for relevance and��
]]>
LMArena at the University of California, Berkeley is making it easier to see which large language models excel at specific tasks, thanks to help from NVIDIA and Nebius. Its rankings, powered by the Prompt-to-Leaderboard (P2L) model, collect votes from humans on which AI performs best in areas such as math, coding, or creative writing. ��We capture user preferences across tasks and apply��
]]>
NVIDIA Air offers the unique ability to simulate anything from a small network to an entire data center. Before you start configuration, routing, or management, consider the topology first. A network topology is the layout or structure of how devices connect and communicate within a network. It describes both the physical arrangement and the logical flow of data.
]]>
Data centers are being re-architected for efficient delivery of AI workloads. This is a hugely complicated endeavor, and NVIDIA is now delivering AI factories based on the NVIDIA rack-scale architecture. To deliver the best performance for the AI factory, many accelerators need to work together at rack-scale with maximal bandwidth and minimal latency to support the largest number of users in the��
]]>
High-performance computing and deep learning workloads are extremely sensitive to latency. Packet loss forces retransmission or stalls in the communication pipeline, which directly increases latency and disrupts the synchronization between GPUs. This can degrade the performance of collective operations such as all-reduce or broadcast, where every GPU��s participation is required before progressing.
]]>
NVIDIA Air enables cloud-scale efficiency by creating identical replicas of real-world data center infrastructure deployments. With NVIDIA Air, you can spin up hundreds of switches and servers and configure them with a single script. One of the many advantages of NVIDIA Air is the ability to connect your simulations with the real world. Enabling an external connection in your environment can��
]]>
Time-series data has evolved from a simple historical record into a real-time engine for critical decisions across industries. Whether it��s streamlining logistics, forecasting markets, or anticipating machine failures, organizations need more sophisticated tools than traditional methods can offer. NVIDIA GPU-accelerated deep learning is enabling industries to gain real-time analytics.
]]>
As many enterprises move to running AI training or inference on their data, the data and the code need to be protected, especially for large language models (LLMs). Many customers can��t risk placing their data in the cloud because of data sensitivity. Such data may contain personally identifiable information (PII) or company proprietary information, and the trained model has valuable intellectual��
]]>
The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.
]]>
For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead to breakthrough capabilities. In the last 5 years, pretraining scaling has increased compute requirements at an incredible rate of 50M times. However, building more intelligent systems is no longer just about pretraining bigger models.
]]>
NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined approach for building flexible and cost-effective accelerated infrastructure while ensuring compatibility and interoperability. The latest Enterprise RA details an optimized cluster configuration for systems integrated with NVIDIA GH200��
]]>
In data science, operational efficiency is key to handling increasingly complex and large datasets. GPU acceleration has become essential for modern workflows, offering significant performance improvements. RAPIDS is a suite of open-source libraries and frameworks developed by NVIDIA, designed to accelerate data science pipelines using GPUs with minimal code changes.
]]>
AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage fabric��connecting high-speed storage arrays��is equally important. Storage performance plays a key role across several stages of the AI lifecycle, including training checkpointing, inference techniques such as retrieval-augmented generation��
]]>
NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.
]]>
The advent of AI has introduced a new type of data center, the AI factory, purpose-built from the ground up to handle AI workloads. AI workloads can significantly vary in scope and scale, but in every case, the network is key to ensuring high performance and faster time to value. To accelerate time to AI and offer enhanced return on investment, NVIDIA Air enables organizations to build��
]]>
Last month at the Supercomputing 2024 conference, NVIDIA announced the availability of NVIDIA H200 NVL, the latest NVIDIA Hopper platform. Optimized for enterprise workloads, NVIDIA H200 NVL is a versatile platform that delivers accelerated performance for a wide range of AI and HPC applications. With its dual-slot PCIe form-factor and 600W TGP, the H200 NVL enables flexible configuration options��
]]>
Confidential and self-sovereign AI is a new approach to AI development, training, and inference where the user��s data is decentralized, private, and controlled by the users themselves. This post explores how the capabilities of Confidential Computing (CC) are expanded through decentralization using blockchain technology. The problem being solved is most clearly shown through the use of��
]]>
NVIDIA technology helps organizations build and maintain secure, scalable, and high-performance network infrastructure. Advances in AI, with NVIDIA at the forefront, contribute every day to security advances. One way NVIDIA has taken a more direct approach to network security is through a secure network operating system (NOS). A secure network operating system (NOS) is a specialized type of��
]]>
AI and scientific computing applications are great examples of distributed computing problems. The problems are too large and the computations too intensive to run on a single machine. These computations are broken down into parallel tasks that are distributed across thousands of compute engines, such as CPUs and GPUs. To achieve scalable performance, the system relies on dividing workloads��
]]>
In software development, testing is crucial for ensuring the quality and reliability of the final product. However, creating test plans and specifications can be time-consuming and labor-intensive, especially when managing multiple requirements and diverse test types in complex systems. Many of these tasks are traditionally performed manually by test engineers. This post is part of the��
]]>
NVIDIA designed the NVIDIA Grace CPU to be a new kind of high-performance, data center CPU��one built to deliver breakthrough energy efficiency and optimized for performance at data center scale. Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. To deliver these speedups, full-stack innovation at data center scale is��
]]>
Generative models have been making big waves in the past few years, from intelligent text-generating large language models (LLMs) to creative image and video-generation models. At NVIDIA, we are exploring using generative AI models to speed up the circuit design process and deliver better designs to meet the ever-increasing demands for computational power. Circuit design is a challenging��
]]>
Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a highly optimized inference engine are required for high-throughput, low-latency inference. MLPerf Inference v4.1 is the latest version of the popular and widely recognized MLPerf Inference benchmarks, developed by the MLCommons��
]]>
In today��s rapidly evolving technological landscape, staying ahead of the curve is not just a goal��it��s a necessity. The surge of innovations, particularly in AI, is driving dramatic changes across the technology stack. One area witnessing profound transformation is Ethernet networking, a cornerstone of digital communication that has been foundational to enterprise and data center��
]]>
Large language models (LLM) are getting larger, increasing the amount of compute required to process inference requests. To meet real-time latency requirements for serving today��s LLMs and do so for as many users as possible, multi-GPU compute is a must. Low latency improves the user experience. High throughput reduces the cost of service. Both are simultaneously important. Even if a large��
]]>
NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and Tensor Cores and accelerate key Fused Attention kernels using CUTLASS 3. FlashAttention-3 incorporates key techniques to achieve 1.5�C2.0x faster performance than FlashAttention-2 with FP16, up to 740 TFLOPS. With FP8��
]]>
Testing out networking infrastructure and building working PoCs for a new environment can be tricky at best and downright dreadful at worst. You may run into licensing requirements you don��t meet, or pay pricey fees for advanced hypervisor software. Proprietary network systems can cost hundreds or thousands of dollars just to set up a test environment to play with. You may even be stuck testing on��
]]>
NVIDIA operates one of the largest and most complex supply chains in the world. The supercomputers we build connect tens of thousands of NVIDIA GPUs with hundreds of miles of high-speed optical cables. We rely on hundreds of partners to deliver thousands of different components to a dozen factories to build nearly three thousand products. A single disruption to our supply chain can impact our��
]]>
The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance computing (HPC) workloads. This post provides an overview of the following updates on cuBLAS matrix multiplications (matmuls) since version 12.0, and a walkthrough: Grouped GEMM APIs can be viewed as a generalization of the batched��
]]>
As cyberattacks become more sophisticated, organizations must constantly adapt with cutting-edge solutions to protect their critical assets. One such solution is Cisco Secure Workload, a comprehensive security solution designed to safeguard application workloads across diverse infrastructures, locations, and form factors. Cisco recently announced version 3.9 of the Cisco Secure Workload��
]]>
The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They often achieve striking results on a wide variety of use cases without any need for customization. Despite this, studies have shown that the best accuracy on downstream tasks can be achieved by adapting LLMs with high-quality��
]]>
In today��s data-driven landscape, maximizing performance and efficiency in data processing and analytics is critical. While many Databricks users are familiar with using GPU clusters for machine learning training, there��s a vast opportunity to leverage GPU acceleration for data processing and analytics tasks as well. Databricks�� Data Intelligence Platform empowers users to manage both small��
]]>
NVIDIA today announced the latest release of NVIDIA TensorRT, an ecosystem of APIs for high-performance deep learning inference. TensorRT includes inference runtimes and model optimizations that deliver low latency and high throughput for production applications. This post outlines the key features and upgrades of this release, including easier installation, increased usability��
]]>
This week��s model release features DBRX, a state-of-the-art large language model (LLM) developed by Databricks. With demonstrated strength in programming and coding tasks, DBRX is adept at handling specialized topics and writing specific algorithms in languages like Python. It can also be used for text completion tasks and few-turn interactions. DBRX long-context abilities can be used in RAG��
]]>
NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released Parakeet-TDT. This new addition to the?NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B. This post explains Parakeet-TDT and how to use it to generate highly accurate��
]]>Generative AI is transforming computing, paving new avenues for humans to interact with computers in natural, intuitive ways. For enterprises, the prospect of generative AI is vast. Businesses can tap into their rich datasets to streamline time-consuming tasks��from text summarization and translation to insight prediction and content generation. But they must also navigate adoption challenges.
]]>
Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI models��including large language models (LLMs)��are used for crafting marketing copy, writing computer code, rendering detailed images, composing music, generating videos, and more. The amount of compute required by the latest models is immense and��
]]>
Learn how the NVIDIA Blackwell GPU architecture is revolutionizing AI and accelerated computing.
]]>
Migrating between major versions of software can present several challenges to the infrastructure management teams: These challenges can prevent users from adopting the newer versions, so they miss out on newer, more powerful features. Effective planning and thorough testing are essential to overcoming these challenges and ensuring a smooth transition. Cumulus Linux 3.7.x and 4.x.
]]>
NVIDIA Spectrum-X is swiftly gaining traction as the leading networking platform tailored for AI in hyperscale cloud infrastructures. Spectrum-X networking technologies help enterprise customers accelerate generative AI workloads. NVIDIA announced significant OEM adoption of the platform in a November 2023 press release, along with an update on the NVIDIA Israel-1 Supercomputer powered by Spectrum��
]]>
Advances in AI are rapidly transforming every industry. Join us in person or virtually to learn about the latest technologies, from retrieval-augmented generation to OpenUSD.
]]>
This week��s Model Monday release features the NVIDIA-optimized code Llama, Kosmos-2, and SeamlessM4T, which you can experience directly from your browser. With NVIDIA AI Foundation Models and Endpoints, you can access a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. Meta��s Code Llama 70B is the latest��
]]>
NVIDIA Metropolis Microservices for Jetson has been renamed to Jetson Platform Services, and is now part of NVIDIA JetPack SDK 6.0. Building vision AI applications for the edge often comes with notoriously long and costly development cycles. At the same time, quickly developing edge AI applications that are cloud-native, flexible, and secure has never been more important. Now��
]]>
A common technological misconception is that performance and complexity are directly linked. That is, the highest-performance implementation is also the most challenging to implement and manage. When considering data center networking, however, this is not the case. InfiniBand is a protocol that sounds daunting and exotic in comparison to Ethernet, but because it is built from the ground up��
]]>
Data center automation dates to the early days of the mainframe, with operational efficiency topping the list of its benefits. Over the years, technologies have changed both inside and outside the data center. As a result, tools and approaches have evolved as well. The NVIDIA NVUE Collection and Ansible aim to simplify your network automation journey by providing a comprehensive list of��
]]>
Traditional cloud data centers have served as the bedrock of computing infrastructure for over a decade, catering to a diverse range of users and applications. However, data centers have evolved in recent years to keep up with advancements in technology and the surging demand for AI-driven computing. This post explores the pivotal role that networking plays in shaping the future of data centers��
]]>
In today��s data center, there are many ways to achieve system redundancy from a server connected to a fabric. Customers usually seek redundancy to increase service availability (such as achieving end-to-end AI workloads) and find system efficiency using different multihoming techniques. In this post, we discuss the pros and cons of the well-known proprietary multi-chassis link aggregation��
]]>
On Aug. 29, learn how to create efficient AI models with NVIDIA TAO Toolkit on STM32 MCUs.
]]>
Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes.
]]>
We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on trillions of point-of-sale transaction records in a few hours. The results of this job would feed a series of downstream machine learning (ML) models that would make critical retail assortment allocation decisions for a global retailer.
]]>
The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware capabilities accelerating CUDA applications. Now out through general availability from NVIDIA, CUDA Toolkit 12.2 includes many new capabilities, both major and minor. The following post offers an overview of many of the key��
]]>
Read about an innovative GPU solution that solves limitations using small biased datasets with RAPIDS cuDF.
]]>
AI is the topic of conversation around the world in 2023. It is rapidly being adopted by all industries including media, entertainment, and broadcasting. To be successful in 2023 and beyond, companies and agencies must embrace and deploy AI more rapidly than ever before. The capabilities of new AI programs like video analytics, ChatGPT, recommenders, speech recognition, and customer service are��
]]>
Most modern digital chips integrate large numbers of macros in the form of memory blocks or analog blocks, like clock generators. These macros are often much larger than standard cells, which are the fundamental building blocks of digital designs. Macro placement has a tremendous impact on the landscape of the chip, directly affecting many design metrics, such as area and power consumption.
]]>
A SmartNIC is a programmable accelerator that makes data center networking, security and storage efficient and flexible.
]]>
PTP uses an algorithm and method for synchronizing clocks on various devices across packet-based networks to provide submicrosecond accuracy. NVIDIA Spectrum supports PTP in both one-step and two-step modes and can serve either as a boundary or a transparent clock. Here��s how the switch calculates and synchronizes time in one-step mode when acting as a transparent clock. Later in this post��
]]>
Inference is where we interact with AI. Chat bots, digital assistants, recommendation engines, fraud protection services, and other applications that you use every day��all are powered by AI. Those deployed applications use inference to get you the information that you need. Given the wide array of usages for AI inference, evaluating performance poses numerous challenges for developers and��
]]>
NVIDIA is committed to making it easier for developers to deploy software from our NGC container registry. As part of that commitment, last week we announced our NGC-Ready program, which expands the places users of powerful systems with NVIDIA GPUs can deploy GPU-accelerated software with confidence. Today, we��re announcing several new NGC-Ready systems from even more of the world��s leading��
]]>
The internet has changed how people consume media. Rather than just watching television and movies, the combination of ubiquitous mobile devices, massive computation, and available Internet bandwidth has led to an explosion in user-created content: users are re-creating the Internet, producing exabytes of content every day. Periscope, a mobile application that lets users broadcast video��
]]>
Using NVIDIA Tesla K80s, China��s Tsinghua University team and JMI University in India both took top honors at the popular student contest. At the International Supercomputing Conference (ISC) in Frankfurt, Germany, China��s Tsinghua University team collected their fifth student challenge gold cup (and second ISC win). The popular student contest brings together university teams from��
]]>