NVIDIA technology helps organizations build and maintain secure, scalable, and high-performance network infrastructure. Advances in AI, with NVIDIA at the forefront, contribute every day to security advances. One way NVIDIA has taken a more direct approach to network security is through a secure network operating system (NOS). A secure network operating system (NOS) is a specialized type of��
]]>In the era of generative AI, accelerated networking is essential to build high-performance computing fabrics for massively distributed AI workloads. NVIDIA continues to lead in this space, offering state-of-the-art Ethernet and InfiniBand solutions that maximize the performance and efficiency of AI factories and cloud data centers. At the core of these solutions are NVIDIA SuperNICs��a new��
]]>In today��s rapidly evolving technological landscape, staying ahead of the curve is not just a goal��it��s a necessity. The surge of innovations, particularly in AI, is driving dramatic changes across the technology stack. One area witnessing profound transformation is Ethernet networking, a cornerstone of digital communication that has been foundational to enterprise and data center��
]]>Testing out networking infrastructure and building working PoCs for a new environment can be tricky at best and downright dreadful at worst. You may run into licensing requirements you don��t meet, or pay pricey fees for advanced hypervisor software. Proprietary network systems can cost hundreds or thousands of dollars just to set up a test environment to play with. You may even be stuck testing on��
]]>Accelerated networking combines CPUs, GPUs, DPUs (data processing units), or SuperNICs into an accelerated computing fabric specifically designed to optimize networking workloads. It uses specialized hardware to offload demanding tasks to enhance server capabilities. As AI and other new workloads continue to grow in complexity and scale, the need for accelerated networking becomes paramount.
]]>Traditional cloud data centers have served as the bedrock of computing infrastructure for over a decade, catering to a diverse range of users and applications. However, data centers have evolved in recent years to keep up with advancements in technology and the surging demand for AI-driven computing. This post explores the pivotal role that networking plays in shaping the future of data centers��
]]>In today��s data center, there are many ways to achieve system redundancy from a server connected to a fabric. Customers usually seek redundancy to increase service availability (such as achieving end-to-end AI workloads) and find system efficiency using different multihoming techniques. In this post, we discuss the pros and cons of the well-known proprietary multi-chassis link aggregation��
]]>As data generation continues to increase, linear performance scaling has become an absolute requirement for scale-out storage. Storage networks are like car roadway systems: if the road is not built for speed, the potential speed of a car does not matter. Even a Ferrari is slow on an unpaved dirt road full of obstacles. Scale-out storage performance can be hindered by the Ethernet fabric��
]]>For HPC clusters purposely built for AI training, such as the NVIDIA DGX BasePOD and NVIDIA DGX SuperPOD, fine-tuning the cluster is critical to increasing and optimizing the overall performance of the cluster. This includes fine-tuning the overall performance of the management fabric (based on Ethernet), storage fabric (Ethernet or InfiniBand), and the compute fabric (Ethernet or InfiniBand).
]]>Wireless technology has evolved rapidly and the 5G deployments have made good progress around the world. Up until recently, wireless RAN was deployed using closed-box appliance solutions by traditional RAN vendors. This closed-box approach is not scalable, underuses the infrastructure, and does not deliver optimal RAN TCO. It has many shortcomings. We have come to realize that such closed-box��
]]>Large language models (LLMs) and AI applications such as ChatGPT and DALL-E have recently seen rapid growth. Thanks to GPUs, CPUs, DPUs, high-speed storage, and AI-optimized software innovations, AI is now widely accessible. You can even deploy AI in the cloud or on-premises. Yet AI applications can be very taxing on the network, and this growth is burdening CPU and GPU servers��
]]>We all know that AI is changing the world. For network admins, AI can improve day-to-day operations in some amazing ways: However, AI is no replacement for the know-how of an experienced network admin. AI is meant to augment your capabilities, like a virtual assistant. So, AI may become your best friend, but generative AI is also a new data center workload that brings a new paradigm��
]]>AI has seamlessly integrated into our lives and changed us in ways we couldn��t even imagine just a few years ago. In the past, the perception of AI was something futuristic and complex. Only giant corporations used AI on their supercomputers with HPC technologies to forecast weather and make breakthrough discoveries in healthcare and science. Today, thanks to GPUs, CPUs, high-speed storage��
]]>The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment scenarios. High-performance, accelerated AI platforms are needed to meet the demands of these applications and deliver the best user experiences. New AI models are constantly being invented to enable new capabilities��
]]>Check out this NVIDIA GTC 2023 playlist to see all the sessions on accelerated networking, sustainable data centers, Ethernet for HPC, and more.
]]>Everyone agrees that open solutions are the best solutions but, there are few truly open operating systems for Ethernet switches. At NVIDIA, we embraced open source for our Ethernet switches. Besides supporting SONiC, we have contributed many innovations to open-source community projects. This post was originally published on the Mellanox blog in June 2018 but has been updated.
]]>Enterprises of all sizes are increasingly leveraging virtualization and hyperconverged infrastructure (HCI). This technology delivers reliable and secure compute resources for operations while reducing data center footprint. HCI clusters rely on robust, feature-rich networking fabrics to deliver on-premises solutions that can seamlessly connect to the cloud. Microsoft Azure Stack HCI is a��
]]>Modern data centers can run thousands of services and applications. When an issue occurs, as a network administrator, you are guilty by default. You have to prove your innocence on a daily basis, as it is easy to blame the network. It is an unfair world. Correlating application performance issues to the network is hard to do. You can start by checking basic connectivity using simple pings or��
]]>Normally, data center networks are updated when new applications or servers are installed in the infrastructure. But independent of new server and application infrastructure forcing an update, there are other areas to consider. Three questions to ask when assessing if you need to update your network are: Network device selection typically starts with understanding how the server��
]]>NetQ 4.1.0 was recently released, introducing fabric-wide network latency and buffer occupancy analysis along with many other enhancements. For more information about all the new capabilities, see the NetQ 4.1.0 User Guide. This post covers the following features: For the first time, NetQ offers network-wide fabric latency and buffer occupancy analysis by using the live��
]]>Networking simulations are essential since the classical model of deployment, based on CLI and adventurous copy/paste-based configuration, has become inefficient for medium�C and large-scale environments. NVIDIA Air provides a platform to build, simulate, and experience a modern data center powered by a modern network operating system (NOS). NVIDIA Air is a cloud-based environment��
]]>Cumulus Linux 4.4 is the first release with the NVIDIA User Experience (NVUE), a brand new CLI for Cumulus Linux. Being excited about a new networking CLI sounds a bit like being excited about your new 56k modem. What makes NVUE special isn��t just that it��s a new CLI but it��s the principles it was built on that make it unique. At its core, NVUE has created a full object model of Cumulus Linux��
]]>When you see a browser ad for a new restaurant, or the perfect gift for that hard-to-please family member, you probably aren��t thinking about the infrastructure used to deliver that ad. However, that infrastructure is what allows advertising companies like Criteo to provide these insights. The NVIDIA networking portfolio is essential to Criteo technology stack. Criteo is an online advertising��
]]>NVIDIA Cumulus Linux is the industry��s most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other. The recently released Cumulus Linux 4.4. In CL 4.4 provides innovation, advanced features and scale enhancements based on the guiding principles of simplicity. CL 4.4 includes the following notable new features��
]]>NVIDIA NetQ 4.0.0 was recently released with many new capabilities. NVIDIA?NetQ?is a highly-scalable?modern network operations tool leveraging fabric-wide telemetry data for visibility and troubleshooting of the overlay and underlay network in real-time. NetQ can be deployed on customer premises, or can be consumed as cloud-based service (SaaS). For more details, refer to the NetQ datasheet.
]]>NVIDIA recently commissioned IDC to conduct research into the business value and technical benefits of the NVIDIA Ethernet switch solution. IDC analysts Brad Casemore and Harsh Singh interviewed IT organizations with real world experience deploying and managing Cumulus Linux and NVIDIA Spectrum switches in mission critical data centers over a significant time period.
]]>This blog post was updated on 9/23/2024. NVIDIA is committed to your success when you choose SONiC (Software for Open Networking in the Cloud), the free, community-developed, Linux-based network operating system (NOS) hardened in the data centers of some of the largest cloud service providers. SONiC is an ideal choice for centers looking for a low-cost, scalable, and fully controllable NOS��
]]>This post was originally published on the Cumulus Networks site. EVPN multihoming (EVPN-MH) is the latest addition to the NVIDIA EVPN story. In this three-part video series, I walk you through the various design elements of EVPN-MH. EVPN-MH provides support for all-active server redundancy. In this video, I offer a feature overview and a comparison with EVPN-MLAG.
]]>This is the third post in the Accelerating IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture; positioned it in the broader context of CUDA, CUDA-X, and vertical application domains; and listed the four major components of the��
]]>