Developers building powerful multimodal applications now have a new state-of-the-art model designed for enterprise-scale performance with Mistral Medium 3. Mistral Medium 3 combines high-performance, efficiency, and versatility in a compact deployment footprint. Designed for commercial and on-prem use cases, this dense model runs efficiently on NVIDIA Hopper GPUs…
]]>A gene that can be an early indicator for Alzheimer’s disease actually is a cause of the degenerative-brain disorder, said researchers at the University of California, San Diego. That finding, which they discovered using AI, could result in new treatment options. In a paper published in April in the scientific journal Cell, a team at UCSD found that the gene PHGDH—previously considered a…
]]>Synthetic data has become a standard part of large language model (LLM) post-training procedures. Using a large number of synthetically generated examples from either a single or cohort of open-source, commercially permissible LLMs, a base LLM is finetuned either with supervised finetuning or RLHF to gain instruction-following and reasoning skills. This process can be seen as a knowledge…
]]>Join us at GTC Paris on June 10th and choose from six full-day, instructor-led workshops.
]]>The integration of NVIDIA NIM microservices into Azure AI Foundry marks a major leap forward in enterprise AI development. By combining NIM microservices with Azure’s scalable, secure infrastructure, organizations can now deploy powerful, ready-to-use AI models more efficiently than ever before. NIM microservices are containerized for GPU-accelerated inferencing for pretrained and customized…
]]>As organizations strive to maximize the value of their generative AI investments, accessing the latest model developments is crucial to continued success. By using state-of-the-art models on Day-0, teams can harness these innovations efficiently, maintain relevance, and be competitive. The past year has seen a flurry of exciting model series releases in the open-source community…
]]>Scientific research in complex fields like battery innovation is often slowed by manual evaluation of materials, limiting progress to just dozens of candidates per day. In this blog post, we explore how domain-adapted large language models (LLMs), enhanced with reasoning capabilities, are transforming scientific research, especially in high-stakes, complex domains like battery innovation.
]]>NVIDIA Agent Intelligence toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents. It focuses on enabling developers to quickly build, evaluate, profile, and accelerate complex agentic AI workflows?—?systems in which multiple AI agents collaborate to perform tasks. The Agent Intelligence toolkit acts as a unifying framework that integrates existing…
]]>Realistic 3D simulation is becoming a cornerstone of modern AI and graphics, from training autonomous vehicles (AV) to powering robotics and digital twins. Neural rendering techniques like NeRFs and 3D Gaussian Splatting (3DGS) have revolutionized how 3D scenes are reconstructed and visualized from raw sensor data. In this post, we introduce the implementation of 3D Gaussian Unscented…
]]>In today’s educational landscape, generative AI tools have become both a blessing and a challenge. While these tools offer unprecedented access to information, they’ve also created new concerns about academic integrity. Increasingly, students rely on AI to generate direct answers to homework questions, often at the expense of developing critical thinking skills and mastering core concepts.
]]>Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable developers to build highly accurate LLMs, NVIDIA previously released Nemotron-CC, a 6.3-trillion-token English language Common Crawl (CC) dataset. Today, the NVIDIA NeMo Curator team is excited to share that the pipeline used to build the…
]]>This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. When building LLM-based applications, it is critical to understand the performance characteristics of these models on a given hardware. This serves multiple purposes: As a client-side LLM-focused benchmarking tool…
]]>Alibaba recently released Tongyi Qwen3, a family of open-source hybrid-reasoning large language models (LLMs). The Qwen3 family consists of two MoE models, 235B-A22B (235B total parameters and 22B active parameters) and 30B-A3B, and six dense models, including the 0.6B, 1.7B, 4B, 8B, 14B, 32B versions. With ultra-fast token generation, developers can efficiently integrate and deploy Qwen3…
]]>Explore the groundbreaking projects and real-world impacts of the HackAI Challenge powered by NVIDIA AI Workbench and Dell Precision.
]]>The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two of the most important applications of CUDA-X libraries are training and inference LLMs, whether for use in everyday consumer applications or highly specialized scientific domains like drug discovery. Multiple CUDA-X libraries are indispensable…
]]>It’s 10 p.m. on a Tuesday when the phone rings at the Sapochnick Law Firm, a specialized law practice in San Diego, California. The caller, a client of the firm, is anxious as the phone rings. They received an important letter containing potentially life-changing news, and had urgent questions for their lawyer. The client quickly realizes the Sapochnick team likely left the office hours ago…
]]>When interacting with transformer-based models like large language models (LLMs) and vision-language models (VLMs), the structure of the input shapes the model’s output. But prompts are often more than a simple user query. In practice, they optimize the response by dynamically assembling data from various sources such as system instructions, context data, and user input.
]]>AI is rapidly moving beyond centralized cloud and data centers, becoming a powerful tool deployable directly on professional workstations. Thanks to advanced hardware and optimized software, you can build, run, and experiment with sophisticated AI models at your desk or on the go. Welcome to the world of local AI development! Running and developing AI locally on a workstation offers…
]]>The first release of NVIDIA NIM Operator simplified the deployment and lifecycle management of inference pipelines for NVIDIA NIM microservices, reducing the workload for MLOps, LLMOps engineers, and Kubernetes admins. It enabled easy and fast deployment, auto-scaling, and upgrading of NIM on Kubernetes clusters. Learn more about the first release. Our customers and partners have been using…
]]>The age of passive AI is over. A new era is beginning, where AI doesn’t just respond—it thinks, plans, and acts. The rapid advancement of large language models (LLMs) has unlocked the potential of agentic AI systems, enabling the automation of tedious tasks across many fields, including cybersecurity. Traditionally, AI applications in cybersecurity have focused primarily on detecting…
]]>This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. Researchers from the University College London (UCL) Deciding, Acting, and Reasoning with Knowledge (DARK) Lab leverage NVIDIA NIM microservices in their new game-based benchmark suite, Benchmarking Agentic LLM and VLM Reasoning On Games…
]]>Large language models (LLMs) have enabled AI tools that help you write more code faster, but as we ask these tools to take on more and more complex tasks, there are limitations that become apparent. Challenges such as understanding the nuances of programming languages, complex dependencies, and adapting to codebase-specific context can lead to lower-quality code and cause bottlenecks down the line.
]]>Enterprise data is constantly changing. This presents significant challenges for maintaining AI system accuracy over time. As organizations increasingly rely on agentic AI systems to optimize business processes, keeping these systems aligned with evolving business needs and new data becomes crucial. This post dives into how to build an iteration of a data flywheel using NVIDIA NeMo…
]]>Missed GTC? This year’s training labs are now available on demand to watch anywhere, anytime.
]]>State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs. By leveraging the latest FP8 quantization features on NVIDIA Hopper GPUs with NVIDIA TensorRT, it’s possible to significantly reduce inference costs and serve more users with fewer GPUs.
]]>AI has become nearly synonymous with innovation. As it rushes onto the world stage, AI is seeding inspiration in creators and problem-solvers of all stripes—from artists to more traditional industrial inventors. One of the world’s leading AI-first artists, Alexander Reben, has spent his career integrating AI into different artistic mediums. His current work explores AI and robotics and…
]]>Build a high-performance agentic AI system using the open-source NVIDIA Agent Intelligence toolkit — contest runs May 12 to May 23.
]]>Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today’s state-of-the-art models can generate Python scripts, React-based websites, and more. In the future, powerful AI models will assist developers in writing high-performance GPU code. This raises an important question: How can it be determined whether an LLM…
]]>Federated learning (FL) has emerged as a promising approach for training machine learning models across distributed data sources while preserving data privacy. However, FL faces significant challenges related to communication overhead and local resource constraints when balancing model requirements and communication capabilities. Particularly in the current era of large language models…
]]>AI is no longer just about generating text or images—it’s about deep reasoning, detailed problem-solving, and powerful adaptability for real-world applications in business and in financial, customer, and healthcare services. Available today, the latest Llama Nemotron Ultra reasoning model from NVIDIA delivers leading accuracy among open-source models across intelligence and coding benchmarks…
]]>You’re invited to join the challenge. Develop and apply innovative data filtering techniques to curate datasets that enhance edge LM performance.
]]>Try NVIDIA Llama Nemotron Ultra as an NVIDIA NIM microservice. At only 253B total parameters, it offers reasoning performance that meets or beats top open reasoning models like DeepSeek-R1 while offering considerably higher throughput due to its optimized sizing, and retaining excellent tool calling capabilities.
]]>Scientific papers are highly heterogeneous, often employing diverse terminologies for the same entities, using varied methodologies to study biological phenomena, and presenting findings within distinct contexts. Extracting meaningful insights from these papers requires a profound understanding of biology, a critical evaluation of methodologies, and the ability to discern robust findings from…
]]>NVIDIA AI Workbench 2025.03.10 features streamlined onboarding and enhanced UX for multicontainer projects.
]]>This updated post was originally published on March 18, 2025. Organizations are embracing AI agents to enhance productivity and streamline operations. To maximize their impact, these agents need strong reasoning abilities to navigate complex problems, uncover hidden connections, and make logical decisions autonomously in dynamic environments. Due to their ability to tackle complex…
]]>As large language models (LLM) gain popularity in various question-answering systems, retrieval-augmented generation (RAG) pipelines have also become a focal point. RAG pipelines combine the generation power of LLMs with external data sources and retrieval mechanisms, enabling models to access domain-specific information that may not have existed during fine-tuning.
]]>Nearly 300,000 women across the globe die each year due to complications arising from pregnancy or childbirth. The number of stillborns and babies that die within their first month tops nearly 4M every year. April 7 marks World Health Day, which this year focuses on raising awareness about efforts to end preventable maternal and newborn deaths. Giving women and infants better access to…
]]>Join the hackathon to build open-source AI solutions, optimize models, enhance workflows, connect with peers, and win prizes.
]]>The newest generation of the popular Llama AI models is here with Llama 4 Scout and Llama 4 Maverick. Accelerated by NVIDIA open-source software, they can achieve over 40K output tokens per second on NVIDIA Blackwell B200 GPUs, and are available to try as NVIDIA NIM microservices. The Llama 4 models are now natively multimodal and multilingual using a mixture-of-experts (MoE) architecture.
]]>The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.
]]>This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM benchmarking, fundamental concepts, and how to benchmark your LLM applications. The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution.
]]>Since the release of ChatGPT in November 2022, the capabilities of large language models (LLMs) have surged, and the number of available models has grown exponentially. With this expansion, LLMs now vary widely in cost, performance, and specialization. For example, straightforward tasks like text summarization can be efficiently handled by smaller, general-purpose models. In contrast…
]]>From hyperlocal forecasts that guide daily operations to planet-scale models illuminating new climate insights, the world is entering a new frontier in weather and climate resilience. The combination of space-based observations and GPU-accelerated AI delivers near-instant, context-rich insights to enterprises, governments, researchers, and solution providers worldwide. It also marks a rare…
]]>Electric vehicles (EVs) are transforming transportation, but challenges such as cost, longevity, and range remain barriers to widespread adoption. At the heart of these challenges lies battery technology—specifically, the electrolyte, a critical component that enables energy storage and delivery. The electrolyte’s properties directly impact a battery’s charging speed, power output, stability…
]]>With emerging use cases such as digital humans, agents, podcasts, images, and video generation, generative AI is changing the way we interact with PCs. This paradigm shift calls for new ways of interfacing with and programming generative AI models. However, getting started can be daunting for PC developers and AI enthusiasts. Today, NVIDIA released a suite of NVIDIA NIM microservices on…
]]>Microsoft, in collaboration with NVIDIA, announced transformative performance improvements for the Meta Llama family of models on its Azure AI Foundry platform. These advancements, enabled by NVIDIA TensorRT-LLM optimizations, deliver significant gains in throughput, reduced latency, and improved cost efficiency, all while preserving the quality of model outputs. With these improvements…
]]>For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead to breakthrough capabilities. In the last 5 years, pretraining scaling has increased compute requirements at an incredible rate of 50M times. However, building more intelligent systems is no longer just about pretraining bigger models.
]]>In the United Arab Emirates (UAE), extreme weather events disrupt daily life, delaying flights, endangering transportation, and complicating urban planning. High daytime temperatures limit human activity outdoors, while dense nighttime fog is a frequent cause of severe and often fatal car crashes. Meanwhile, 2024 saw the heaviest precipitation event in the country in 75 years…
]]>The growing volume and complexity of medical data—and the pressing need for early disease diagnosis and improved healthcare efficiency—are driving unprecedented advancements in medical AI. Among the most transformative innovations in this field are multimodal AI models that simultaneously process text, images, and video. These models offer a more comprehensive understanding of patient data than…
]]>Generative chemistry with AI has the potential to revolutionize how scientists approach drug discovery and development, health, and materials science and engineering. Instead of manually designing molecules with “chemical intuition” or screening millions of existing chemicals, researchers can train neural networks to propose novel molecular structures tailored to the desired properties.
]]>NVIDIA Parabricks is a scalable genomics analysis software suite that solves omics challenges with accelerated computing and deep learning to unlock new scientific breakthroughs. Released at NVIDIA GTC 2025, NVIDIA Parabricks v4.5 supports the growing quantity of data by including support for the latest NVIDIA GPU architectures, and improved alignment and variant calling with the…
]]>With the rise of physical AI, video content generation has surged exponentially. A single camera-equipped autonomous vehicle can generate more than 1 TB of video daily, while a robotics-powered manufacturing facility may produce 1 PB of data daily. To leverage this data for training and fine-tuning world foundation models (WFMs), you must first process it efficiently.
]]>Enterprises are generating and storing more multimodal data than ever before, yet traditional retrieval systems remain largely text-focused. While they can surface insights from written content, they aren’t extracting critical information embedded in tables, charts, and infographics—often the most information-dense elements of a document. Without a multimodal retrieval system…
]]>With the release of NVIDIA Agent Intelligence toolkit—an open-source library for connecting and optimizing teams of AI agents—developers, professionals, and researchers can create their own agentic AI applications. This tutorial shows you how to develop apps in the Agent Intelligence toolkit through an example of AI code generation. We build a test-driven coding agent using LangGraph and reasoning…
]]>As agentic AI systems evolve and become essential for optimizing business processes, it is crucial for developers to update them regularly to stay aligned with ever-changing business and user needs. Continuously refining these agents with AI and human feedback ensures that they remain effective and relevant. NVIDIA NeMo microservices is a fully accelerated, enterprise-grade solution designed…
]]>NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.
]]>NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a maximum throughput of over 30,000 tokens per second on the massive, state-of-the-art 671 billion parameter DeepSeek-R1 model. These rapid advancements in performance at both ends of the performance…
]]>The next generation of AI-driven robots like humanoids and autonomous vehicles depends on high-fidelity, physics-aware training data. Without diverse and representative datasets, these systems don’t get proper training and face testing risks due to poor generalization, limited exposure to real-world variations, and unpredictable behavior in edge cases. Collecting massive real-world datasets for…
]]>AI is transforming how we experience our favorite games. It is unlocking new levels of visuals, performance, and gameplay possibilities with neural rendering and generative AI-powered characters. With game development becoming more complex, AI is also playing a role in helping artists and engineers realize their creative visions. At GDC 2025, NVIDIA is building upon NVIDIA RTX Kit…
]]>Building AI systems with foundation models requires a delicate balancing of resources such as memory, latency, storage, compute, and more. One size does not fit all for developers managing cost and user experience when bringing generative AI capability to the rapidly growing ecosystem of AI-powered applications. You need options for high-quality, customizable models that can support large…
]]>With the recent advancements in generative AI and vision foundational models, VLMs present a new wave of visual computing wherein the models are capable of highly sophisticated perception and deep contextual understanding. These intelligent solutions offer a promising means of enhancing semantic comprehension in XR settings. By integrating VLMs, developers can significantly improve how XR…
]]>Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of applications, including translation, digital assistants, recommendation systems, context analysis, code generation, cybersecurity, and more. In automotive applications, there is growing demand for LLM-based solutions for both autonomous driving and…
]]>Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale increases, automation is critical to maintaining high GPU utilization and training productivity. An exceptional training experience requires resilient systems that provide low-latency error attribution and automatic fail over based on root…
]]>Learn from and connect with leading AI developers building the next generation of AI agents.
]]>Applications requiring high-performance information retrieval span a wide range of domains, including search engines, knowledge management systems, AI agents, and AI assistants. These systems demand retrieval processes that are accurate and computationally efficient to deliver precise insights, enhance user experiences, and maintain scalability. Retrieval-augmented generation (RAG) is used to…
]]>Join these sessions to learn how accelerated computing, generative AI, and physics-based world simulation are advancing physical and embodied AI.
]]>Discover cutting-edge AI and data science innovations from top generative AI teams at NVIDIA GTC 2025.
]]>Safeguarding AI agents and other conversational AI applications to ensure safe, on-brand and reliable behavior is essential for enterprises. NVIDIA NeMo Guardrails offers robust protection with AI guardrails for content safety, topic control, jailbreak detection, and more to evaluate and optimize guardrail performance. In this post, we explore techniques for measuring and optimizing your AI…
]]>AI agents are transforming business operations by automating processes, optimizing decision-making, and streamlining actions. Their effectiveness hinges on expert reasoning, enabling smarter planning and efficient execution. Agentic AI applications could benefit from the capabilities of models such as DeepSeek-R1. Built for solving problems that require advanced AI reasoning…
]]>As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of businesses and points of interest across Korea. Users can search about different places, leave reviews, and place bookings or orders in real time.
]]>Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridge quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to…
]]>A well-crafted systematic review is often the initial step for researchers exploring a scientific field. For scientists new to this field, it provides a structured overview of the domain. For experts, it refines their understanding and sparks new ideas. In 2024 alone, 218,650 review articles were indexed in the Web of Science database, highlighting the importance of these resources in research.
]]>In today’s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined, effective solutions for quick deployments, prototyping, or experimentation. One of the key challenges in information retrieval is managing the diverse modalities in unstructured datasets, including text, PDFs, images, tables, audio, video…
]]>Vision language models (VLMs) are evolving at a breakneck speed. In 2020, the first VLMs revolutionized the generative AI landscape by bringing visual understanding to large language models (LLMs) through the use of a vision encoder. These initial VLMs were limited in their abilities, only able to understand text and single image inputs. Fast-forward a few years and VLMs are now capable of…
]]>Chip and hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around time (TAT) for optimizing performance, power, area, and cost (PPAC) during synthesis, verification, physical design, and reliability loops. Large language models (LLMs) have shown a remarkable capacity to comprehend and generate natural…
]]>There is an activity where people provide inputs to generative AI technologies, such as large language models (LLMs), to see if the outputs can be made to deviate from acceptable standards. This use of LLMs began in 2023 and has rapidly evolved to become a common industry practice and a cornerstone of trustworthy AI. How can we standardize and define LLM red teaming?
]]>Agentic workflows are the next evolution in AI-powered tools. They enable developers to chain multiple AI models together to perform complex activities, enable AI models to use tools to access additional data or automate user actions, and enable AI models to operate autonomously, analyzing and performing complex tasks with a minimum of human involvement or interaction. Because of their power…
]]>Generative AI, powered by advanced machine learning models and deep neural networks, is revolutionizing industries by generating novel content and driving innovation in fields like healthcare, finance, and entertainment. NVIDIA is leading this transformation with its cutting-edge GPU architectures and software ecosystems, such as the H100 Tensor Core GPU and CUDA platform…
]]>NVIDIA AI Enterprise is the cloud-native software platform for the development and deployment of production-grade AI solutions. The latest release of the NVIDIA AI Enterprise infrastructure software collection adds support for the latest NVIDIA data center GPU, NVIDIA H200 NVL, giving your enterprise new options for powering cutting-edge use cases such as agentic and generative AI with some of the…
]]>Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often time-consuming and resource intensive. These conventional methods typically involve stages such as requirement gathering, conceptual design, detailed design, analysis, prototyping, and testing, with each phase dependent on the results of previous…
]]>NVIDIA has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry. Earlier versions of NVIDIA Riva, a collection of GPU-accelerated speech and translation AI microservices for ASR, TTS, and NMT, support English-Spanish and English-Japanese code-switching ASR models based on the Conformer architecture, along with a model supporting multiple…
]]>Join us on February 27 to learn how to transform PDFs into AI podcasts using the NVIDIA AI Blueprint.
]]>Generative AI, especially with breakthroughs like AlphaFold and RosettaFold, is transforming drug discovery and how biotech companies and research laboratories study protein structures, unlocking groundbreaking insights into protein interactions. Proteins are dynamic entities. It has been postulated that a protein’s native state is known by its sequence of amino acids alone…
]]>AI has evolved from an experimental curiosity to a driving force within biological research. The convergence of deep learning algorithms, massive omics datasets, and automated laboratory workflows has allowed scientists to tackle problems once thought intractable—from rapid protein structure prediction to generative drug design, increasing the need for AI literacy among scientists.
]]>Learn from researchers, scientists, and industry leaders across a variety of topics including AI, robotics, and Data Science.
]]>Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents, these models assist developers with various tasks, including enhancing code, fixing bugs, generating tests, and writing documentation. To promote the development of open-source LLMs, the Qwen team recently released Qwen2.5-Coder…
]]>Master prompt engineering, fine-tuning, and customization to build video analytics AI agents.
]]>As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long-thinking, this technique improves model performance by allocating additional computational resources during inference to evaluate multiple possible outcomes and then selecting the best one…
]]>Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. The How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model post discussed the best practices of using large language models (LLMs) that combine depth, width, attention, and MLP pruning with knowledge distillation…
]]>In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a comprehensive evaluation of the entire stack, from compute to networking to model framework. Navigating the complexities of AI system performance can be difficult. There are many application changes that you can make…
]]>Explore the latest advancements in academia, including advanced research, innovative teaching methods, and the future of learning and technology.
]]>Translation plays an essential role in enabling companies to expand across borders, with requirements varying significantly in terms of tone, accuracy, and technical terminology handling. The emergence of sovereign AI has highlighted critical challenges in large language models (LLMs), particularly their struggle to capture nuanced cultural and linguistic contexts beyond English-dominant…
]]>NVIDIA AI Workbench is a free development environment manager to develop, customize, and prototype AI applications on your GPUs. AI Workbench provides a frictionless experience across PCs, workstations, servers, and cloud for AI, data science, and machine learning (ML) projects. The user experience includes: This post provides details about the January 2025 release of NVIDIA AI Workbench…
]]>Matrix multiplication and attention mechanisms are the computational backbone of modern AI workloads. While libraries like NVIDIA cuDNN provide highly optimized implementations, and frameworks such as CUTLASS offer deep customization, many developers and researchers need a middle ground that combines performance with programmability. The open-source Triton compiler on the NVIDIA Blackwell…
]]>Connect AI applications to enterprise data using embedding and reranking models for information retrieval.
]]>NVIDIA DLSS 4 is the latest iteration of DLSS introduced with the NVIDIA GeForce RTX 50 Series GPUs. It includes several new features: Here’s how you can get started with DLSS 4 in your integrations. This post focuses on the Streamline SDK, which provides a plug-and-play framework for simplified plugin integration. The NVIDIA Streamline SDK is an open-source framework that…
]]>NVIDIA recently announced a new generation of PC GPUs—the GeForce RTX 50 Series—alongside new AI-powered SDKs and tools for developers. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.
]]>Evaluating large language models (LLMs) and retrieval-augmented generation (RAG) systems is a complex and nuanced process, reflecting the sophisticated and multifaceted nature of these systems. Unlike traditional machine learning (ML) models, LLMs generate a wide range of diverse and often unpredictable outputs, making standard evaluation metrics insufficient. Key challenges include the…
]]>Despite the success of large language models (LLMs) as general-purpose AI tools, their high demand for computational resources make their deployment challenging in many real-world scenarios. The sizes of the model and conversation state are limited by the available high-bandwidth memory, limiting the number of users that can be served and the maximum conversation length. At present…
]]>