Somshubra Majumdar – NVIDIA Technical Blog

Somshubra Majumdar – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-08-12T16:09:12Z http://www.open-lab.net/blog/feed/ Somshubra Majumdar <![CDATA[Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT]]> http://www.open-lab.net/blog/?p=80732 2024-08-12T16:06:21Z 2024-04-18T20:03:54Z

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released...]]>

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released Parakeet-TDT. This new addition to the?NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B. This post explains Parakeet-TDT and how to use it to generate highly accurate…

]]> 0 Somshubra Majumdar <![CDATA[Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models]]> http://www.open-lab.net/blog/?p=80564 2024-08-12T16:07:43Z 2024-04-18T20:03:07Z

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the...]]>

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the Parakeet family of automatic speech recognition (ASR) models. These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy. This post details Parakeet ASR models that are…

]]> 0 Somshubra Majumdar <![CDATA[NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy]]> http://www.open-lab.net/blog/?p=79365 2024-08-12T16:09:12Z 2024-03-19T16:00:00Z

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition...]]>

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition (ASR) family of models and the NVIDIA Canary multilingual, multitask ASR and translation model currently top the Hugging Face Open ASR Leaderboard. In addition, a multilingual P-Flow-based text-to-speech (TTS) model won the LIMMITS ’24…

]]> Somshubra Majumdar <![CDATA[Controlled Adaptation of Speech Recognition Models to New Domains]]> http://www.open-lab.net/blog/?p=60523 2023-06-12T07:55:13Z 2023-02-03T14:00:00Z

Have you ever tried to fine-tune a speech recognition system on your accent only to find that, while it recognizes your voice well, it fails to detect words...]]>

Have you ever tried to fine-tune a speech recognition system on your accent only to find that, while it recognizes your voice well, it fails to detect words spoken by others? This is common in speech recognition systems that have trained on hundreds of thousands of hours of speech. In large-scale automatic speech recognition (ASR), a system may perform well in many but not all scenarios.

]]> 0 Somshubra Majumdar <![CDATA[Improving Japanese Language ASR by Combining Convolutions with Attention Mechanisms]]> http://www.open-lab.net/blog/?p=54745 2023-06-12T08:56:00Z 2022-09-12T14:30:00Z

Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours...]]>

Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours of speech. Recent literature has renewed focus on more complex languages, such as Japanese. Like other Asian languages, Japanese has a vast base character set (upwards of 3,000 unique characters are used in common vernacular)…

]]> 0 ��˳��97caoporen��