The newest generation of the popular Llama AI models is here with Llama 4 Scout and Llama 4 Maverick. Accelerated by NVIDIA open-source software, they can achieve over 40K output tokens per second on NVIDIA Blackwell B200 GPUs, and are available to try as NVIDIA NIM microservices. The Llama 4 models are now natively multimodal and multilingual using a mixture-of-experts (MoE) architecture.
]]>Building AI systems with foundation models requires a delicate balancing of resources such as memory, latency, storage, compute, and more. One size does not fit all for developers managing cost and user experience when bringing generative AI capability to the rapidly growing ecosystem of AI-powered applications. You need options for high-quality, customizable models that can support large…
]]>Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridge quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to…
]]>Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance respective to the older Llama 3.1 70B model and can even match the capabilities of the larger, more computationally expensive Llama 3.1 405B model on several tasks including math, reasoning, coding…
]]>