As large language models increasingly take on reasoning-intensive tasks in areas like math and science, their output lengths are getting significantly longer—sometimes spanning tens of thousands of tokens. This shift makes efficient throughput a critical bottleneck, especially when deploying models in real-world, latency-sensitive environments. To address these challenges and enable the…
]]>Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language. This is fueled by the growing multilingual communities as well as by the need to reduce complexity. You only need one model to handle multiple languages. This post explains how to use pretrained multilingual NeMo ASR models from the…
]]>