As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it’s important to understand the compute and memory profile of these microservices to set up a successful autoscaling…
]]>