Harry Kim

Harry Kim is a Principal Product Manager at NVIDIA enabling performant and scalable AI/ML inference with Triton. He has experience working on recommendation systems at Meta, AI infrastructure at Intel AI, and Ads ranking and recommendation at Google. He holds a PhD in Statistics from UC Berkeley.

Posts by Harry Kim

AI Platforms / Deployment May 21, 2025

NVIDIA Dynamo Accelerates llm-d Community Initiatives for Advancing Large-Scale Distributed Inference

The introduction of the llm-d community at Red Hat Summit 2025 marks a significant step forward in accelerating generative AI inference innovation for the open... 5 MIN READ

Three icons, with text LLMs, Optimize, Deploy.

Data Center / Cloud May 20, 2025

NVIDIA Dynamo Adds GPU Autoscaling, Kubernetes Automation, and Networking Optimizations

At NVIDIA GTC 2025, we announced NVIDIA Dynamo, a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning... 7 MIN READ

Development & Optimization Mar 18, 2025

Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for... 14 MIN READ

Generative AI Aug 01, 2024

Measuring Generative AI Model Performance Using NVIDIA GenAI-Perf and an OpenAI-Compatible API

NVIDIA offers tools like Perf Analyzer and Model Analyzer to assist machine learning engineers with measuring and balancing the trade-off between latency and... 6 MIN READ