Blackwell Breaks the 1,000 TPS/User Barrier With Meta��s Llama 4 Maverick – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-08-08T23:13:07Z http://www.open-lab.net/blog/feed/ Yilin Fan <![CDATA[Blackwell Breaks the 1,000 TPS/User Barrier With Meta��s Llama 4 Maverick]]> http://www.open-lab.net/blog/?p=100729 2025-06-12T18:51:04Z 2025-05-23T00:09:02Z NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over...]]> NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over...

NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model, the largest and most powerful model available in the Llama 4 collection. This speed was independently measured by the AI benchmarking service��

Source

]]>
1
���˳���97caoporen����