Large language models (LLMs) have created unprecedented opportunities across various industries. However, moving LLMs from research and development into reliable, scalable, and maintainable production systems presents unique operational challenges. LLMOps, or large language model operations, are designed to address these challenges. Building upon the principles of traditional machine��
]]>Imagine you��re leading security for a large enterprise and your teams are eager to leverage AI for more and more projects. There��s a problem, though. As with any project, you must balance the promise and returns of innovation with the hard realities of compliance, risk management, and security posture mandates. Security leaders face a crucial challenge when evaluating AI models such as those��
]]>Note: This blog post was originally published on Oct. 28, 2024, but has been edited to reflect new updates. Fraud in financial services is a massive problem. Financial losses from worldwide credit card transaction fraud are expected to total $403.88 billion over the next 10 years, according to research firm the Nilson Report. While other types of fraud��such as identity theft, account takeover��
]]>AI is becoming the cornerstone of innovation across industries, driving new levels of creativity and productivity and fundamentally reshaping how we live and work. And it��s enabled by a new type of infrastructure��the AI factory��that manufactures intelligence at scale and creates the foundation for what many consider the next industrial revolution. AI factories represent a reset of traditional��
]]>The integration of NVIDIA NIM microservices into Azure AI Foundry marks a major leap forward in enterprise AI development. By combining NIM microservices with Azure��s scalable, secure infrastructure, organizations can now deploy powerful, ready-to-use AI models more efficiently than ever before. NIM microservices are containerized for GPU-accelerated inferencing for pretrained and customized��
]]>The worldwide adoption of generative AI has driven massive demand for accelerated compute hardware globally. In enterprises, this has accelerated the deployment of accelerated private cloud infrastructure. At the regional level, this demand for compute infrastructure has given rise to a new category of cloud providers who offer accelerated compute (GPU) capacity for AI workloads, also known as GPU��
]]>Advanced AI models such as DeepSeek-R1 are proving that enterprises can now build cutting-edge AI models specialized with their own data and expertise. These models can be tailored to unique use cases, tackling diverse challenges like never before. Based on the success of early AI adopters, many organizations are shifting their focus to full-scale production AI factories. Yet the process of��
]]>NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.
]]>There is an activity where people provide inputs to generative AI technologies, such as large language models (LLMs), to see if the outputs can be made to deviate from acceptable standards. This use of LLMs began in 2023 and has rapidly evolved to become a common industry practice and a cornerstone of trustworthy AI. How can we standardize and define LLM red teaming?
]]>NVIDIA AI Enterprise is the cloud-native software platform for the development and deployment of production-grade AI solutions. The latest release of the NVIDIA AI Enterprise infrastructure software collection adds support for the latest NVIDIA data center GPU, NVIDIA H200 NVL, giving your enterprise new options for powering cutting-edge use cases such as agentic and generative AI with some of the��
]]>NVIDIA AI Workbench is a free development environment manager to develop, customize, and prototype AI applications on your GPUs. AI Workbench provides a frictionless experience across PCs, workstations, servers, and cloud for AI, data science, and machine learning (ML) projects. The user experience includes: This post provides details about the January 2025 release of NVIDIA AI Workbench��
]]>In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and summarization. However, despite their advanced capabilities, foundation models have limitations when it comes to domain-specific expertise such as finance or healthcare or capturing cultural and language nuances beyond English.
]]>AI development has become a core part of modern software engineering, and NVIDIA is committed to finding ways to bring optimized accelerated computing to every developer that wants to start experimenting with AI. To address this, we��ve been working on making the accelerated computing stack more accessible with NVIDIA Launchables: preconfigured GPU computing environments that enable you to��
]]>This white paper details our commitment to securing the NVIDIA AI Enterprise software stack. It outlines the processes and measures NVIDIA takes to ensure container security.
]]>Large language models (LLMs) are rapidly changing the business landscape, offering new capabilities in natural language processing (NLP), content generation, and data analysis. These AI-powered tools have improved how companies operate, from streamlining customer service to enhancing decision-making processes. However, despite their impressive general knowledge, LLMs often struggle with��
]]>As global electricity demand continues to rise, traditional sources of energy are increasingly unsustainable. Energy providers are facing pressure to reduce reliance on fossil fuels while ensuring a fully supplied and stable grid. In this context, solar energy has emerged as a vital renewable resource, being one of the most abundant clean energy sources available. However��
]]>Last month at the Supercomputing 2024 conference, NVIDIA announced the availability of NVIDIA H200 NVL, the latest NVIDIA Hopper platform. Optimized for enterprise workloads, NVIDIA H200 NVL is a versatile platform that delivers accelerated performance for a wide range of AI and HPC applications. With its dual-slot PCIe form-factor and 600W TGP, the H200 NVL enables flexible configuration options��
]]>Generative AI is transforming every aspect of the automotive industry, including software development, testing, user experience, personalization, and safety. With the automotive industry shifting from a mechanically driven approach to a software-driven one, generative AI is unlocking a world of possibilities. Tata Consultancy Services (TCS) focuses on two major segments for leveraging��
]]>AI agents powered by large language models (LLMs) help organizations streamline and reduce manual workloads. These agents use multilevel, iterative reasoning to analyze problems, devise solutions, and execute tasks with various tools. Unlike traditional chatbots, LLM-powered agents automate complex tasks by effectively understanding and processing information. To avoid potential risks in specific��
]]>The exponential growth of visual data��ranging from images to PDFs to streaming videos��has made manual review and analysis virtually impossible. Organizations are struggling to transform this data into actionable insights at scale, leading to missed opportunities and increased risks. To solve this challenge, vision-language models (VLMs) are emerging as powerful tools��
]]>AI agents are emerging as the newest way for organizations to increase efficiency, improve productivity, and accelerate innovation. These agents are more advanced than prior AI applications, with the ability to autonomously reason through tasks, call out to other tools, and incorporate both enterprise data and employee knowledge to produce valuable business outcomes. They��re being embedded into��
]]>In the rapidly evolving landscape of AI and data science, the demand for scalable, efficient, and flexible infrastructure has never been higher. Traditional infrastructure can often struggle to meet the demands of modern AI workloads, leading to bottlenecks in development and deployment processes. As organizations strive to deploy AI models and data-intensive applications at scale��
]]>As enterprises increasingly adopt AI technologies, they face a complex challenge of efficiently developing, securing, and continuously improving AI applications to leverage their data assets. They need a unified, end-to-end solution that simplifies AI development, enhances security, and enables continuous optimization, allowing organizations to harness the full potential of their data for AI��
]]>Inferencing for generative AI and AI agents will drive the need for AI compute infrastructure to be distributed from edge to central clouds. IDC predicts that ��Business AI (consumer excluded) will contribute $19.9 trillion to the global economy and account for 3.5% of GDP by 2030.�� 5G networks must also evolve to serve this new incoming AI traffic. At the same time, there is an opportunity��
]]>In the rapidly evolving field of medicine, the integration of cutting-edge technologies is crucial for enhancing patient care and advancing research. One such innovation is retrieval-augmented generation (RAG), which is transforming how medical information is processed and used. RAG combines the capabilities of large language models (LLMs) with external knowledge retrieval��
]]>Providing customers with quality service remains a top priority for businesses across industries, from answering questions and troubleshooting issues to facilitating online orders. As businesses scale operations and expand offerings globally to compete, the demand for seamless customer service grows exponentially. Searching knowledge base articles or navigating complex phone trees can be a��
]]>Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an updated Llama Guard model with support for vision. When paired with the NVIDIA accelerated computing platform, Llama 3.2 offers developers, researchers, and enterprises valuable new capabilities and optimizations to realize their��
]]>Global energy technology company SLB has announced the next milestone in its long-standing collaboration with NVIDIA to develop and scale generative AI solutions for the energy industry. The collaboration accelerates the development and deployment of energy industry-specific generative AI foundation models across SLB global platforms, including its Delfi digital platform and SLB��s new Lumi��
]]>Equipping agentic AI applications with tools will usher in the next phase of AI. By enabling autonomous agents and other AI applications to fetch real-time data, perform actions, and interact with external systems, developers can bridge the gap to new, real-world use cases that significantly enhance productivity and the user experience. xpander AI, a member of the NVIDIA Inception program for��
]]>The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a variety of use cases. With 405 billion parameters and support for context lengths of up to 128K tokens, Llama 3.1 405B is also one of the most demanding LLMs to run. To deliver both low latency to optimize the user experience and high��
]]>As large language models (LLMs) continue to evolve at an unprecedented pace, enterprises are looking to build generative AI-powered applications that maximize throughput to lower operational costs and minimize latency to deliver superior user experiences. This post discusses the critical performance metrics of throughput and latency for LLMs, exploring their importance and trade-offs between��
]]>Meta��s Llama collection of large language models are the most popular foundation models in the open-source community today, supporting a variety of use cases. Millions of developers worldwide are building derivative models, and are integrating these into their applications. With Llama 3.1, Meta is launching a suite of large language models (LLMs) as well as a suite of trust and safety models��
]]>The rapidly evolving field of generative AI is focused on building neural networks that can create realistic content such as text, images, audio, and synthetic data. Generative AI is revolutionizing multiple industries by enabling rapid creation of content, powering intelligent knowledge assistants, augmenting software development with coding co-pilots, and automating complex tasks across various��
]]>The latest embedding model from NVIDIA��NV-Embed��set a new record for embedding accuracy with a score of 69.32 on the Massive Text Embedding Benchmark (MTEB), which covers 56 embedding tasks. Highly accurate and effective models like NV-Embed are key to transforming vast amounts of data into actionable insights. NVIDIA provides top-performing models through the NVIDIA API catalog.
]]>Manufacturers face increased pressures to shorten production cycles, enhance productivity, and improve quality, all while reducing costs. To address these challenges, they��re investing in industrial digitalization and AI-enabled digital twins to unlock new possibilities from planning to operations. Developers at Pegatron, an electronics manufacturer based in Taiwan, used NVIDIA AI��
]]>NVIDIA today launched the NVIDIA RTX AI Toolkit, a collection of tools and SDKs for Windows application developers to customize, optimize, and deploy AI models for Windows applications. It��s free to use, doesn��t require prior experience with AI frameworks and development tools, and delivers the best AI performance for both local and cloud deployments. The wide availability of generative��
]]>NVIDIA AI Workbench, a toolkit for AI and ML developers, is now generally available as a free download. It features automation that removes roadblocks for novice developers and makes experts more productive. Developers can experience a fast and reliable GPU environment setup and the freedom to work, manage, and collaborate across heterogeneous platforms regardless of skill level.
]]>NVIDIA SDKs have been instrumental in accelerating AI applications across a spectrum of use cases spanning smart cities, medical, and robotics. However, achieving a production-grade AI solution that can deployed at the edge to support human and machine collaboration safely and securely needs both high-quality hardware and software tailored for enterprise needs. NVIDIA is again accelerating��
]]>The rise in generative AI adoption has been remarkable. Catalyzed by the launch of OpenAI��s ChatGPT in 2022, the new technology amassed over 100M users within months and drove a surge of development activities across almost every industry. By 2023, developers began POCs using APIs and open-source community models from Meta, Mistral, Stability, and more. Entering 2024��
]]>Generative AI has the potential to transform every industry. Human workers are already using large language models (LLMs) to explain, reason about, and solve difficult cognitive tasks. Retrieval-augmented generation (RAG) connects LLMs to data, expanding the usefulness of LLMs by giving them access to up-to-date and accurate information. Many enterprises have already started to explore how��
]]>As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by iteratively shaping random noise into AI-generated art through denoising diffusion��
]]>This week��s model release features the NVIDIA-optimized language model Smaug 72B, which you can experience directly from your browser. NVIDIA AI Foundation Models and Endpoints are a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. Try leading models such as Nemotron-3, Mixtral 8x7B, Gemma 7B��
]]>This week��s model release features the NVIDIA-optimized language model Phi-2, which can be used for a wide range of natural language processing (NLP) tasks. You can experience Phi-2 directly from your browser. NVIDIA AI Foundation Models and Endpoints are a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications.
]]>This week��s Model Monday release features the NVIDIA-optimized code Llama, Kosmos-2, and SeamlessM4T, which you can experience directly from your browser. With NVIDIA AI Foundation Models and Endpoints, you can access a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. Meta��s Code Llama 70B is the latest��
]]>While harnessing the potential of AI is a priority for many of today��s enterprises, developing and deploying an AI model involves time and effort. Often, challenges must be overcome to move a model into production, especially for mission-critical business operations. According to IDC research, only 18% of enterprises surveyed could put an AI model into production in under a month.
]]>Following the introduction of ChatGPT, enterprises around the globe are realizing the benefits and capabilities of AI, and are racing to adopt it into their workflows. As this adoption accelerates, it becomes imperative for enterprises not only to keep pace with the rapid advancements in AI, but also address related challenges such as optimization, scalability, and security.
]]>Convai is a versatile developer platform for designing characters with advanced multimodal perception abilities. These characters are designed to integrate seamlessly into both the virtual and real worlds. Whether you��re a creator, game designer, or developer, Convai enables you to quickly modify a non-playable character (NPC), from backstory and knowledge to voice and personality.
]]>There are many ways to deploy ML models to production. Sometimes, a model is run once per day to refresh forecasts in a database. Sometimes, it powers a small-scale but critical decision-making dashboard or speech-to-text on a mobile device. These days, the model can also be a custom large language model (LLM) backing a novel AI-driven product experience. Often, the model is exposed to its��
]]>Businesses rely more than ever on data and AI to innovate, offer value to customers, and stay competitive. The adoption of machine learning (ML), created a need for tools, processes, and organizational principles to manage code, data, and models that work reliably, cost-effectively, and at scale. This is broadly known as machine learning operations (MLOps). The world is venturing rapidly into��
]]>Crossing the chasm and reaching its iPhone moment, generative AI must scale to fulfill exponentially increasing demands. Reliability and uptime are critical for building generative AI at the enterprise level, especially when AI is core to conducting business operations. NVIDIA is investing its expertise into building a solution for those enterprises ready to take the leap.
]]>AI is transforming industries, automating processes, and opening new opportunities for innovation in the rapidly evolving technological landscape. As more businesses recognize the value of incorporating AI into their operations, they face the challenge of implementing these technologies efficiently, effectively, and reliably. Enter NVIDIA AI Enterprise, a comprehensive software suite��
]]>AI is the topic of conversation around the world in 2023. It is rapidly being adopted by all industries including media, entertainment, and broadcasting. To be successful in 2023 and beyond, companies and agencies must embrace and deploy AI more rapidly than ever before. The capabilities of new AI programs like video analytics, ChatGPT, recommenders, speech recognition, and customer service are��
]]>At COMPUTEX 2023, NVIDIA announced the NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI workloads. In addition to describing critical aspects of the NVIDIA DGX GH200 architecture, this post discusses how NVIDIA Base Command enables rapid deployment, accelerates the onboarding of users, and simplifies system management.
]]>The incredible advances of accelerated computing are powered by data. The role of data in accelerating AI workloads is crucial for businesses looking to stay ahead of the curve in the current fast-paced digital environment. Speeding up access to that data is yet another way that NVIDIA accelerates entire AI workflows. NVIDIA DGX Cloud caters to a wide variety of market use cases.
]]>The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��
]]>The business applications of GPU-accelerated computing are set to expand greatly in the coming years. One of the fastest-growing trends is the use of generative AI for creating human-like text and all types of images. Driving the explosion of market interest in generative AI are technologies such as transformer models that bring AI to everyday applications, from conversational text to��
]]>Generative AI has marked an important milestone in the AI revolution journey. We are at a fundamental breaking point where enterprises are not only getting their feet wet but jumping into the deep end. With over 50 frameworks, pretrained models, and development tools, NVIDIA AI Enterprise, the software layer of the NVIDIA AI platform, is designed to accelerate enterprises to the leading edge��
]]>AI is impacting every industry, from improving customer service and streamlining supply chains to accelerating cancer research. As enterprises invest in AI to stay ahead of the competition, they often struggle with finding the strategy and infrastructure for success. Many AI projects are rapidly evolving, which makes production at scale especially challenging. We believe in developing��
]]>Discover how to build a robust MLOps practice for continuous delivery and automated deployment of AI workloads at scale.
]]>Edge computing and edge AI are powering the digital transformation of business processes. But, as a growing field, there are still many questions about what exactly needs to be in an edge management platform. The benefits of edge computing include low latency for real-time responses, using local area networks for higher bandwidth, and storage at lower costs compared to cloud computing.
]]>Two years ago, NVIDIA and VMware announced that they would reimagine and re-architect the data center. Hundreds of engineers dedicated across each company have worked closely to bring this joint solution to fruition. NVIDIA announces the availability of VMware vSphere on the NVIDIA BlueField DPU, providing the ideal solution for delivering a software-defined��
]]>AI has the power to transform every industry, but transformation takes time, and it��s rarely easy. For enterprises across industries to be as successful as possible in their own transformations, they need access to AI-ready technology platforms. They also must be able to use 5G connectivity at the edge to harness valuable data and inform their AI and ML models. Sign up for the latest��
]]>Today, NVIDIA announced general availability of NVIDIA AI Enterprise 2.1. This latest version of the end-to-end AI and data analytics software suite is optimized, certified, and supported for enterprises to deploy and scale AI applications across bare metal, virtual, container, and cloud environments. The NVIDIA AI Enterprise 2.1 release offers advanced data science with the latest��
]]>Cybersecurity software is getting more sophisticated these days, thanks to AI and ML capabilities. It��s now possible to automate security measures without direct human intervention. The value in these powerful solutions is real��in stopping breaches, providing highly detailed alerts, and protecting attack surfaces. Still, it pays to be a skeptic. This interview with NVIDIA experts Bartley��
]]>Deep learning has come to mean the most common implementation of a neural network for performing many AI tasks. Data scientists use software frameworks such as TensorFlow and PyTorch to develop and run DL algorithms. By this point, there has been a lot written about deep learning, and you can find more detailed information from many sources. For a good high-level summary, see What��s the��
]]>There is an ongoing demand for servers with the ability to transfer data from the network to a GPU at ever faster speeds. As AI models keep getting bigger, the sheer volume of data needed for training requires techniques such as multinode training to achieve results in a reasonable timeframe. Signal processing for 5G is more sophisticated than previous generations, and GPUs can help increase the��
]]>The new year has been off to a great start with NVIDIA AI Enterprise 1.1 providing production support for container orchestration and Kubernetes cluster management using VMware vSphere with Tanzu 7.0 update 3c, delivering AI/ML workloads to every business in VMs, containers, or Kubernetes. New NVIDIA AI Enterprise labs for IT admins and MLOps are available on NVIDIA LaunchPad��
]]>Like so many schools and universities during the COVID-19 pandemic, the University of Pisa, one of Italy��s oldest universities, was challenged to find solutions for conducting research while many students had to attend classes remotely. The university��s Research Computing department has a focused study in AI deep learning and machine learning applications, where they normally perform��
]]>NVIDIA AI Enterprise is?a?suite of?AI software,?certified to run on VMware vSphere 7 Update 2?with?NVIDIA-Certified?volume servers. It includes key enabling technologies?and software from NVIDIA for?rapid deployment, management and scaling of AI workloads?in the virtualized data center running on VMware vSphere.?The NVIDIA AI Enterprise suite also enables IT Administrators, Data Scientists��
]]>