State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs. By leveraging the latest FP8 quantization features on NVIDIA Hopper GPUs with NVIDIA TensorRT, it’s possible to significantly reduce inference costs and serve more users with fewer GPUs.
]]>Real-time cloud-scale applications that involve AI-based computer vision are growing rapidly. The use cases include image understanding, content creation, content moderation, mapping, recommender systems, and video conferencing. However, the compute cost of these workloads is growing too, driven by demand for increased sophistication in the processing. The shift from still images to video is…
]]>