Speech & Audio Processing – NVIDIA Technical Blog

Speech & Audio Processing – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-16T23:50:38Z http://www.open-lab.net/blog/feed/ Jee-weon Jung <![CDATA[Multi-Agent AI and GPU-Powered Innovation in Sound-to-Text Technology]]> http://www.open-lab.net/blog/?p=90495 2024-11-12T04:32:34Z 2024-10-22T16:00:00Z

The Automated Audio Captioning task centers around generating natural language descriptions from audio inputs. Given the distinct modalities between the input...]]>

The Automated Audio Captioning task centers around generating natural language descriptions from audio inputs. Given the distinct modalities between the input...

audio-captioning-featured

The Automated Audio Captioning task centers around generating natural language descriptions from audio inputs. Given the distinct modalities between the input (audio) and the output (text), AAC systems typically rely on an audio encoder to extract relevant information from the sound, represented as feature vectors, which a decoder then uses to generate text descriptions.

]]> 0 Sofia Kostandian <![CDATA[Developing Robust Georgian Automatic Speech Recognition with FastConformer Hybrid Transducer CTC BPE]]> http://www.open-lab.net/blog/?p=85835 2024-08-22T18:25:43Z 2024-08-05T16:52:11Z

Building an effective automatic speech recognition (ASR) model for underrepresented languages presents unique challenges due to limited data resources. In...]]>

Building an effective automatic speech recognition (ASR) model for underrepresented languages presents unique challenges due to limited data resources. In... Image of two people sitting in their cubicles with speech recognition visualizations in the background.

Image of two people sitting in their cubicles with speech recognition visualizations in the background.

Building an effective automatic speech recognition (ASR) model for underrepresented languages presents unique challenges due to limited data resources. In this post, I discuss the best practices for preparing the dataset, configuring the model, and training it effectively. I also discuss the evaluation metrics and the encountered challenges. By following these practices��

]]> 0 Elena Rastorgueva <![CDATA[New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model]]> http://www.open-lab.net/blog/?p=80661 2024-08-06T17:19:16Z 2024-04-18T20:09:33Z

NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team...]]>

NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team... Decorative image of text and speech recognition processes encircling the globe.

Decorative image of text and speech recognition processes encircling the globe.

NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team just released?Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. Canary also provides bi-directional translation, between English and the three other supported��

]]> 1 Hainan Xu <![CDATA[Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT]]> http://www.open-lab.net/blog/?p=80732 2024-08-12T16:06:21Z 2024-04-18T20:03:54Z

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released...]]>

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released...

asr-graphic

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released Parakeet-TDT. This new addition to the?NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B. This post explains Parakeet-TDT and how to use it to generate highly accurate��

]]> 0 Somshubra Majumdar <![CDATA[Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models]]> http://www.open-lab.net/blog/?p=80564 2024-08-12T16:07:43Z 2024-04-18T20:03:07Z

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the...]]>

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the... Image of two people sitting in their cubicles with speech recognition visualizations in the background.

Image of two people sitting in their cubicles with speech recognition visualizations in the background.

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the Parakeet family of automatic speech recognition (ASR) models. These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy. This post details Parakeet ASR models that are��

]]> 0 Gordana Neskovic <![CDATA[NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy]]> http://www.open-lab.net/blog/?p=79365 2024-08-12T16:09:12Z 2024-03-19T16:00:00Z

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition...]]>

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition...

speech-ai-composite-graphic

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition (ASR) family of models and the NVIDIA Canary multilingual, multitask ASR and translation model currently top the Hugging Face Open ASR Leaderboard. In addition, a multilingual P-Flow-based text-to-speech (TTS) model won the LIMMITS ��24��

]]> 0 Piotr ?elasko <![CDATA[New Support for Dutch and Persian Released by NVIDIA NeMo ASR]]> http://www.open-lab.net/blog/?p=76636 2024-02-08T18:52:04Z 2024-01-16T18:29:16Z

Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI...]]>

Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI... Person sitting at a desk having a conversation with a speech ai chatbot.

Person sitting at a desk having a conversation with a speech ai chatbot.

Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI landscape. These models leverage the recently introduced FastConformer architecture and were trained simultaneously with CTC and transducer objectives to maximize each model��s accuracy. Automatic speech recognition (ASR) is a��

]]> 1 Belen Tegegn <![CDATA[Video: Exploring Speech AI from Research to Practical Production Applications]]> http://www.open-lab.net/blog/?p=72433 2023-11-16T19:16:46Z 2023-11-07T16:07:22Z

The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented...]]>

The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented... Decorative image of groups of people using speech AI in different ways standing around a globe.

Decorative image of groups of people using speech AI in different ways standing around a globe.

The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented reality experiences. Speech AI Day provided valuable insights into the latest advancements in speech AI, showcasing how this technology addresses real-world challenges. In this first of three Speech AI Day sessions��

]]> 0 Tanya Lenz <![CDATA[Workshop: Building Conversational AI Applications]]> http://www.open-lab.net/blog/?p=70919 2023-11-03T07:14:57Z 2023-09-20T17:00:00Z

Learn how to build and deploy production-quality conversational AI apps with real-time transcription and NLP.]]>

Learn how to build and deploy production-quality conversational AI apps with real-time transcription and NLP.

dli-social-convai-workshop-and-scaling-gpu-1920x1080

Learn how to build and deploy production-quality conversational AI apps with real-time transcription and NLP.

]]> 0 Daniel Korzekwa <![CDATA[Speeding Up Text-To-Speech Diffusion Models by Distillation]]> http://www.open-lab.net/blog/?p=70193 2023-11-03T07:14:57Z 2023-09-01T15:30:11Z

Every year, as part of their coursework, students from the University of Warsaw, Poland get to work under the supervision of engineers from the NVIDIA Warsaw...]]>

Every year, as part of their coursework, students from the University of Warsaw, Poland get to work under the supervision of engineers from the NVIDIA Warsaw... An image representing fast diffusion TTS.

An image representing fast diffusion TTS.

Every year, as part of their coursework, students from the University of Warsaw, Poland get to work under the supervision of engineers from the NVIDIA Warsaw office on challenging problems in deep learning and accelerated computing. We present the work of three M.Sc. students��Alicja Ziarko, Pawe? Pawlik, and Micha? Siennicki��who managed to significantly reduce the latency in TorToiSe��

]]> 2 Anjali Shah <![CDATA[Mastering LLM Techniques: Customization]]> http://www.open-lab.net/blog/?p=68897 2023-12-08T18:54:22Z 2023-08-10T16:30:00Z

Large language models (LLMs) are becoming an integral tool for businesses to improve their operations, customer interactions, and decision-making processes....]]>

Large language models (LLMs) are becoming an integral tool for businesses to improve their operations, customer interactions, and decision-making processes.... Decorative image.

Decorative image.

Large language models (LLMs) are becoming an integral tool for businesses to improve their operations, customer interactions, and decision-making processes. However, off-the-shelf LLMs often fall short in meeting the specific needs of enterprises due to industry-specific terminology, domain expertise, or unique requirements. This is where custom LLMs come into play.

]]> 0 Holger Roth <![CDATA[Adapting LLMs to Downstream Tasks Using Federated Learning on Distributed Datasets]]> http://www.open-lab.net/blog/?p=67237 2024-05-10T00:22:46Z 2023-07-10T20:00:00Z

Large language models (LLMs), such as GPT, have emerged as revolutionary tools in natural language processing (NLP) due to their ability to understand and...]]>

Large language models (LLMs), such as GPT, have emerged as revolutionary tools in natural language processing (NLP) due to their ability to understand and...

llm-graphic

Large language models (LLMs), such as GPT, have emerged as revolutionary tools in natural language processing (NLP) due to their ability to understand and generate human-like text. These models are trained on vast amounts of diverse data, enabling them to learn patterns, language structures, and contextual relationships. They serve as foundational models that can be customized to a wide range of��

]]> 0 Isaac Yang <![CDATA[Boost Your AI Workflows with Federated Learning Enabled by NVIDIA FLARE]]> http://www.open-lab.net/blog/?p=66836 2024-05-10T00:24:31Z 2023-06-14T19:24:44Z

One of the main challenges for businesses leveraging AI in their workflows is managing the infrastructure needed to support large-scale training and deployment...]]>

One of the main challenges for businesses leveraging AI in their workflows is managing the infrastructure needed to support large-scale training and deployment... Connected healthcare facilities graphic

Connected healthcare facilities graphic

One of the main challenges for businesses leveraging AI in their workflows is managing the infrastructure needed to support large-scale training and deployment of machine learning (ML) models. The NVIDIA FLARE platform provides a solution: a powerful, scalable infrastructure for federated learning that makes it easier to manage complex AI workflows across enterprises. NVIDIA FLARE 2.3.0��

]]> 0 Annie Surla <![CDATA[How to Get Better Outputs from Your Large Language Model]]> http://www.open-lab.net/blog/?p=66169 2023-11-03T07:14:59Z 2023-06-14T16:18:05Z

Large language models (LLMs) have generated excitement worldwide due to their ability to understand and process human language at a scale that is unprecedented....]]>

Large language models (LLMs) have generated excitement worldwide due to their ability to understand and process human language at a scale that is unprecedented....

LLM workflow demo.

Large language models (LLMs) have generated excitement worldwide due to their ability to understand and process human language at a scale that is unprecedented. It has transformed the way that we interact with technology. Having been trained on a vast corpus of text, LLMs can manipulate and generate text for a wide variety of applications without much instruction or training. However��

]]> 0 Caroline Gottlieb <![CDATA[Unlocking Speech AI Technology for Global Language Users: Top Q&As]]> http://www.open-lab.net/blog/?p=66216 2023-11-03T07:15:00Z 2023-06-06T17:00:00Z

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common...]]>

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common...

speech-ai-summit-graphic

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common Voice (MCV) and NVIDIA are collaborating to change that by partnering on a public crowdsourced multilingual speech corpus��now the largest of its kind in the world��and open-source pretrained models. It is now easier than ever before to��

]]> 0 Kristen Rumley <![CDATA[How Speech Recognition Improves Customer Service in Telecommunications]]> http://www.open-lab.net/blog/?p=63789 2023-11-03T07:15:01Z 2023-05-02T16:00:00Z

The telecommunication industry has seen a proliferation of AI-powered technologies in recent years, with speech recognition and translation leading the charge....]]>

The telecommunication industry has seen a proliferation of AI-powered technologies in recent years, with speech recognition and translation leading the charge....

How Speech Recognition Improves Customer Service in Telecommunications

The telecommunication industry has seen a proliferation of AI-powered technologies in recent years, with speech recognition and translation leading the charge. Multi-lingual AI virtual assistants, digital humans, chatbots, agent assists, and audio transcription are technologies that are revolutionizing the telco industry. Businesses are implementing AI in call centers to address incoming requests��

]]> 0 Michelle Horton <![CDATA[Now Available: NVIDIA NeMo Guardrails]]> http://www.open-lab.net/blog/?p=63697 2024-11-20T19:59:21Z 2023-04-28T16:00:00Z

Develop safe and trustworthy LLM conversational applications with NVIDIA NeMo Guardrails, an open-source toolkit that enables programmable guardrails for...]]>

Develop safe and trustworthy LLM conversational applications with NVIDIA NeMo Guardrails, an open-source toolkit that enables programmable guardrails for...

Decorative image of glowing purple 2D box with icons.

Develop safe and trustworthy LLM conversational applications with NVIDIA NeMo Guardrails, an open-source toolkit that enables programmable guardrails for defining desired user interactions within an application.

]]> 0 Michelle Horton <![CDATA[Upcoming Workshop: Building Conversational AI Applications]]> http://www.open-lab.net/blog/?p=63421 2023-11-03T07:15:02Z 2023-04-26T18:00:00Z

On May 23 at 9 am CEST learn to build and deploy production-quality conversational AI applications with real-time transcription and natural language processing...]]>

On May 23 at 9 am CEST learn to build and deploy production-quality conversational AI applications with real-time transcription and natural language processing...

An illustration of an example use case for a conversational AI model.

On May 23 at 9 am CEST learn to build and deploy production-quality conversational AI applications with real-time transcription and natural language processing capabilities.

]]> 0 Tanay Varshney <![CDATA[An Introduction to Large Language Models: Prompt Engineering and P-Tuning]]> http://www.open-lab.net/blog/?p=63707 2023-11-28T19:18:25Z 2023-04-26T16:00:00Z

ChatGPT has made quite an impression. Users are excited to use the AI chatbot to ask questions, write poems, imbue a persona for interaction, act as a personal...]]>

ChatGPT has made quite an impression. Users are excited to use the AI chatbot to ask questions, write poems, imbue a persona for interaction, act as a personal...

Large Language Model Basics: Prompt Engineering and P-Tuning

ChatGPT has made quite an impression. Users are excited to use the AI chatbot to ask questions, write poems, imbue a persona for interaction, act as a personal assistant, and more. Large language models (LLMs) power ChatGPT, and these models are the topic of this post. Before considering LLMs more carefully, we would first like to establish what a language model does. A language model gives��

]]> 0 Annamalai Chockalingam <![CDATA[NVIDIA Enables Trustworthy, Safe, and Secure Large Language Model Conversational Systems]]> http://www.open-lab.net/blog/?p=63745 2024-11-20T23:04:35Z 2023-04-25T13:00:00Z

Large language models (LLMs) are incredibly powerful and capable of answering complex questions, performing feats of creative writing, developing, debugging...]]>

Large language models (LLMs) are incredibly powerful and capable of answering complex questions, performing feats of creative writing, developing, debugging...

NeMo Guardrails illustration.

Large language models (LLMs) are incredibly powerful and capable of answering complex questions, performing feats of creative writing, developing, debugging source code, and so much more. You can build incredibly sophisticated LLM applications by connecting them to external tools, for example reading data from a real-time source, or enabling an LLM to decide what action to take given a user��s��

]]> 1 Shashank Gaur <![CDATA[Topic Modeling and Image Classification with Dataiku and NVIDIA Data Science]]> http://www.open-lab.net/blog/?p=62857 2023-11-03T07:15:04Z 2023-04-04T18:30:00Z

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...]]>

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...

Twitter topic model Dataiku diagram

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��

]]> 0 Sean Wagstaff <![CDATA[Create XR Experiences Using Natural-Language Voice Commands: Test Project Mellon]]> http://www.open-lab.net/blog/?p=62285 2023-11-03T07:15:05Z 2023-03-23T15:00:00Z

Project Mellon is a lightweight Python package capable of harnessing the heavyweight power of speech AI (NVIDIA Riva) and large language models (LLMs) (NVIDIA...]]>

Project Mellon is a lightweight Python package capable of harnessing the heavyweight power of speech AI (NVIDIA Riva) and large language models (LLMs) (NVIDIA...

Riva-Project-Mellon

Project Mellon is a lightweight Python package capable of harnessing the heavyweight power of speech AI (NVIDIA Riva) and large language models (LLMs) (NVIDIA NeMo service) to simplify user interactions in immersive environments. NVIDIA announced at NVIDIA GTC 2023 that developers can start testing Project Mellon to explore creating hands-free extended reality (XR) experiences controlled by��

]]> 1 David Taubenheim <![CDATA[Speech AI Spotlight: How Pendulum Nabs Harmful Narratives Online]]> http://www.open-lab.net/blog/?p=60694 2023-11-03T07:15:05Z 2023-02-08T17:00:00Z

Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining...]]>

Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining...

speech-ai-spotlight-story-pendulum-solution-workflow-featured-image

Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining content, you can also spot harmful narratives posing real-life threats. That��s why VP of Engineering at Pendulum, Ammar Haris, wants his company��s AI to help clients to gain deeper insight into the harmful content being generated��

]]> 1 Dima Rekesh <![CDATA[Multilingual and Code-Switched Automatic Speech Recognition with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=60289 2023-11-03T07:15:06Z 2023-01-31T17:00:00Z

Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language....]]>

Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language....

multilingual-asr-featured

Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language. This is fueled by the growing multilingual communities as well as by the need to reduce complexity. You only need one model to handle multiple languages. This post explains how to use pretrained multilingual NeMo ASR models from the��

]]> 0 Siddharth Sharma <![CDATA[Explainer: What Is Conversational AI?]]> http://www.open-lab.net/blog/?p=54534 2024-06-05T22:05:49Z 2022-12-05T20:00:00Z

Real-time natural language understanding will transform how we interact with intelligent machines and applications.]]>

Real-time natural language understanding will transform how we interact with intelligent machines and applications.

sound-waves-1280x680

Real-time natural language understanding will transform how we interact with intelligent machines and applications.

]]> 0 Michelle Horton <![CDATA[Upcoming Event: Speech AI Summit 2022]]> http://www.open-lab.net/blog/?p=56388 2023-11-03T07:15:06Z 2022-10-25T16:00:00Z

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!]]>

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!

speech-ai-summit

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!

]]> 0 Tanya Lenz <![CDATA[New Course: Get Started with Highly Accurate Custom ASR for Speech AI]]> http://www.open-lab.net/blog/?p=55846 2023-11-03T07:15:07Z 2022-10-24T16:30:00Z

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.]]>

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.

dli-sp-course-sept22-riva-asr-li-tw-2048x1024 (1)

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.

]]> 0 Aleksandra Antonova <![CDATA[Building an Automatic Speech Recognition Model for the Kinyarwanda Language]]> http://www.open-lab.net/blog/?p=56301 2023-11-03T07:15:08Z 2022-10-20T14:30:00Z

Speech recognition technology is growing in popularity for voice assistants and robotics, for solving real-world problems through assisted healthcare or...]]>

Speech recognition technology is growing in popularity for voice assistants and robotics, for solving real-world problems through assisted healthcare or...

Speech recognition technology is growing in popularity for voice assistants and robotics, for solving real-world problems through assisted healthcare or education, and more. This is helping democratize access to speech AI worldwide. As labeled datasets for unique, emerging languages become more widely available, developers can build AI applications readily, accurately, and affordably to enhance��

]]> 0 Dave Niewinski <![CDATA[Low-Code Building Blocks for Speech AI Robotics]]> http://www.open-lab.net/blog/?p=55065 2023-11-03T07:15:08Z 2022-09-22T18:33:00Z

When examining an intricate speech AI robotic system, it��s easy for developers to feel intimidated by its complexity. Arthur C. Clarke claimed, ��Any...]]>

When examining an intricate speech AI robotic system, it��s easy for developers to feel intimidated by its complexity. Arthur C. Clarke claimed, ��Any...

gtc22-fall-convai

When examining an intricate speech AI robotic system, it��s easy for developers to feel intimidated by its complexity. Arthur C. Clarke claimed, ��Any sufficiently advanced technology is indistinguishable from magic.�� From accepting natural-language commands to safely interacting in real-time with its environment and the humans around it, today��s speech AI robotics systems can perform tasks to��

]]> 0 Yang Zhang <![CDATA[Text Normalization and Inverse Text Normalization with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=55161 2023-11-03T07:15:09Z 2022-09-16T21:54:42Z

Text normalization (TN) converts text from written form into its verbalized form, and it is an essential preprocessing step before text-to-speech (TTS). TN...]]>

Text normalization (TN) converts text from written form into its verbalized form, and it is an essential preprocessing step before text-to-speech (TTS). TN...

Text normalization (TN) converts text from written form into its verbalized form, and it is an essential preprocessing step before text-to-speech (TTS). TN ensures that TTS can handle all input texts without skipping unknown symbols. For example, ��$123�� is converted to ��one hundred and twenty-three dollars.�� Inverse text normalization (ITN) is a part of the automatic speech recognition (ASR)��

]]> 0 Taejin Park <![CDATA[Dynamic Scale Weighting Through Multiscale Speaker Diarization]]> http://www.open-lab.net/blog/?p=54785 2023-11-03T07:15:09Z 2022-09-16T21:38:00Z

Speaker diarization is the process of segmenting audio recordings by speaker labels and aims to answer the question ��Who spoke when?��. It makes a clear...]]>

Speaker diarization is the process of segmenting audio recordings by speaker labels and aims to answer the question ��Who spoke when?��. It makes a clear...

msdd-featured

Speaker diarization is the process of segmenting audio recordings by speaker labels and aims to answer the question ��Who spoke when?��. It makes a clear distinction when it is compared with speech recognition. Before you perform speaker diarization, you know ��what is spoken�� but you don��t know ��who spoke it��. Therefore, speaker diarization is an essential feature for a speech recognition��

]]> 0 Sirisha Rella <![CDATA[Developing the Next Generation of Extended Reality Applications with Speech AI]]> http://www.open-lab.net/blog/?p=54831 2023-11-03T07:15:10Z 2022-09-14T16:00:00Z

Virtual reality (VR), augmented reality (AR), and mixed reality (MR) environments can feel incredibly real due to the physically immersive experience. Adding a...]]>

Virtual reality (VR), augmented reality (AR), and mixed reality (MR) environments can feel incredibly real due to the physically immersive experience. Adding a...

convai-gtc22-fall-promo-pack-xr-applications-1600x900

]]> 0 Aleksandr Laptev <![CDATA[Changing CTC Rules to Reduce Memory Consumption in Training and Decoding]]> http://www.open-lab.net/blog/?p=54761 2023-12-30T01:55:52Z 2022-09-12T14:30:00Z

Loss functions for training automatic speech recognition (ASR) models are not set in stone. The older rules of loss functions are not necessarily optimal....]]>

Loss functions for training automatic speech recognition (ASR) models are not set in stone. The older rules of loss functions are not necessarily optimal.... Decorative image.

Decorative image.

Loss functions for training automatic speech recognition (ASR) models are not set in stone. The older rules of loss functions are not necessarily optimal. Consider connectionist temporal classification (CTC) and see how changing some of its rules enables you to reduce GPU memory, which is required for training and inference of CTC-based models and more. For more information about the��

]]> 0 Michelle Horton <![CDATA[Upcoming Event: Conversational AI Sessions at GTC 2022]]> http://www.open-lab.net/blog/?p=54056 2023-11-03T07:15:11Z 2022-09-02T16:00:00Z

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.]]>

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.

gtc22-fall-social-session-a41127-quicklink-1600x900

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.

]]> 0 Mikiko Bazeley <![CDATA[An Easy Introduction to Speech AI]]> http://www.open-lab.net/blog/?p=48941 2023-11-03T07:15:11Z 2022-06-23T16:00:00Z

Artificial intelligence (AI) has transformed synthesized speech from monotone robocalls and decades-old GPS navigation systems to the polished tone of virtual...]]>

Artificial intelligence (AI) has transformed synthesized speech from monotone robocalls and decades-old GPS navigation systems to the polished tone of virtual...

ai-for-dev-blog-green-neon-wave-1600x950

Artificial intelligence (AI) has transformed synthesized speech from monotone robocalls and decades-old GPS navigation systems to the polished tone of virtual assistants in smartphones and smart speakers. It has never been so easy for organizations to use customized state-of-the-art speech AI technology for their specific industries and domains. Speech AI is being used to power virtual��

]]> 1 Tanay Varshney <![CDATA[Transforming Noisy Low-Resolution into High-Quality Videos for Captivating End-User Experiences]]> http://www.open-lab.net/blog/?p=37627 2023-11-03T07:15:12Z 2021-09-21T20:05:12Z

Video conferencing, audio and video streaming, and telecommunications recently exploded due to pandemic-related closures and work-from-home policies....]]>

Video conferencing, audio and video streaming, and telecommunications recently exploded due to pandemic-related closures and work-from-home policies....

maxine-video-effects-sdk

Video conferencing, audio and video streaming, and telecommunications recently exploded due to pandemic-related closures and work-from-home policies. Businesses, educational institutions, and public-sector agencies are experiencing a skyrocketing demand for virtual collaboration and content creation applications. The crucial part of online communication is the video stream, whether it��s a simple��

]]> 0 Abhishek Sawarkar <![CDATA[Achieving Noise-Free Audio for Virtual Collaboration and Content Creation Applications]]> http://www.open-lab.net/blog/?p=37611 2023-11-03T07:15:12Z 2021-09-21T19:41:05Z

With audio and video streaming, conferencing, and telecommunication on the rise, it has become essential for developers to build applications with outstanding...]]>

With audio and video streaming, conferencing, and telecommunication on the rise, it has become essential for developers to build applications with outstanding...

maxine-audio-effects

With audio and video streaming, conferencing, and telecommunication on the rise, it has become essential for developers to build applications with outstanding audio quality and enable end users to communicate and collaborate effectively. Various background noises can disrupt communication, ranging from traffic and construction to dogs barking and babies crying. Moreover, a user could talk in a��

]]> 1 Michelle Horton <![CDATA[Inception Spotlight: Supercharging Synthetic Speech with Resemble AI]]> http://www.open-lab.net/blog/?p=36414 2023-11-03T07:15:13Z 2021-08-30T16:49:06Z

Deep learning is proving to be a powerful tool when it comes to high-quality synthetic speech development and customization. A Toronto-based startup, and NVIDIA...]]>

Deep learning is proving to be a powerful tool when it comes to high-quality synthetic speech development and customization. A Toronto-based startup, and NVIDIA...

Resemble_AI

Deep learning is proving to be a powerful tool when it comes to high-quality synthetic speech development and customization. A Toronto-based startup, and NVIDIA Inception member, Resemble AI is upping the stakes with a new generative voice tool able to create high-quality synthetic AI Voices. The technology can generate cross-lingual and naturally speaking voices in over 50 of the most��

]]> 0 Nikolai Liubimov <![CDATA[Generating High-Quality Labels for Speech Recognition with Label Studio and NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=31971 2023-11-03T07:15:13Z 2021-05-24T18:09:03Z

You can save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label...]]>

You can save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label...

High-quality-labels-for-speech-recognition

You can save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label Studio. NVIDIA NeMo provides reusable neural modules that make it easy to create new neural network architectures, including prebuilt modules and ready-to-use models for ASR. With the power of NVIDIA NeMo, you can get audio transcriptions��

]]> 0 Nefi Alarcon <![CDATA[Inception Spotlight: Watch Deepgram Transcribe 10 Hours of Audio in Just 40 Seconds using GPUs]]> https://news.www.open-lab.net/?p=18694 2023-11-03T07:15:14Z 2020-11-18T19:16:43Z

Deepgram, an NVIDIA Inception startup developing automatic speech recognition (ASR) deep learning models, recently published a new demo that highlights the...]]>

Deepgram, an NVIDIA Inception startup developing automatic speech recognition (ASR) deep learning models, recently published a new demo that highlights the...

deepgram-feature

Deepgram, an NVIDIA Inception startup developing automatic speech recognition (ASR) deep learning models, recently published a new demo that highlights the speed and scalability of its platform on NVIDIA GPUs. ��We��ve reinvented Automatic Speech Recognition (ASR) with a complete, deep learning model that allows companies to get faster, more accurate transcription, resulting in more reliable��

]]> 0 Nefi Alarcon <![CDATA[Facebook AI Model Translates Between 100 Languages Without English Data]]> https://news.www.open-lab.net/?p=18535 2023-11-03T07:15:15Z 2020-10-20T17:32:29Z

Facebook AI this week announced they are open sourcing a deep learning model called M2M-100 that can translate any language pair, among 100 languages,...]]>

Facebook AI this week announced they are open sourcing a deep learning model called M2M-100 that can translate any language pair, among 100 languages,...

Facebook-ai

Facebook AI this week announced they are open sourcing a deep learning model called M2M-100 that can translate any language pair, among 100 languages, without relying on English data. For example, when translating from Chinese to French, previous models would train on Chinese to English to French. M2M-100 directly trains on Chinese to French to better preserve meaning. ��Deploying M2M-100 will��

]]> 0 Nefi Alarcon <![CDATA[NVIDIA Research at C-MIMI: Understanding Speech to Automate Charting for Telemedicine and Beyond]]> https://news.www.open-lab.net/?p=18059 2024-11-04T22:55:47Z 2020-09-14T14:05:49Z

COVID-19 is fundamentally changing the doctor-patient dynamic worldwide. Telemedicine is now becoming an essential technology that healthcare providers can...]]>

COVID-19 is fundamentally changing the doctor-patient dynamic worldwide. Telemedicine is now becoming an essential technology that healthcare providers can...

c-mimi

COVID-19 is fundamentally changing the doctor-patient dynamic worldwide. Telemedicine is now becoming an essential technology that healthcare providers can offer patients as an adjunct or alternative for in-person visits that is both effective and convenient. We��ve all been there, at one time or another in the last six months: speaking with a nurse or family doctor using video calling on our��

]]> 0 Nefi Alarcon <![CDATA[Emory University Students Win Amazon��s Alexa Prize for their AI Chatbot]]> https://news.www.open-lab.net/?p=17753 2023-11-03T07:15:15Z 2020-08-06T21:35:19Z

A team of Emory University students won Amazon��s 2020 Alexa Socialbot Grand Challenge, a worldwide competition to create that most engaging AI chatbot. The...]]>

A team of Emory University students won Amazon��s 2020 Alexa Socialbot Grand Challenge, a worldwide competition to create that most engaging AI chatbot. The...

emory_alexa_feature

A team of Emory University students won Amazon��s 2020 Alexa Socialbot Grand Challenge, a worldwide competition to create that most engaging AI chatbot. The team earned $500,000 for their chatbot named Emora. The researchers developed Emora as a social companion that can provide comfort and warmth to people interacting with Alexa-enabled devices. Emora can chat about movies, sports��

]]> 0 Nefi Alarcon <![CDATA[Microsoft Research Unveils New BERT-Based Biomedical NLP AI Model]]> https://news.www.open-lab.net/?p=17734 2023-11-03T07:15:16Z 2020-08-06T18:52:47Z

To help accelerate natural language processing in biomedicine, Microsoft Research developed a BERT-based AI model that outperforms previous biomedicine natural...]]>

To help accelerate natural language processing in biomedicine, Microsoft Research developed a BERT-based AI model that outperforms previous biomedicine natural...

blurb_microsoft

To help accelerate natural language processing in biomedicine, Microsoft Research developed a BERT-based AI model that outperforms previous biomedicine natural languag processing (NLP) methods. The work promises to help researchers rapidly advance research in this field. The model, built on top of Google��s BERT, can classify documents, extract medical information��

]]> 0 Nefi Alarcon <![CDATA[Netflix Builds Proof-of-Concept AI Model to Simplify Subtitles for Translation]]> https://news.www.open-lab.net/?p=17143 2023-11-03T07:15:17Z 2020-07-15T15:56:00Z

To help localize subtitles from English to other languages, such as Russian, Spanish, or Portuguese, Netflix developed a proof-of-concept AI model that can...]]>

To help localize subtitles from English to other languages, such as Russian, Spanish, or Portuguese, Netflix developed a proof-of-concept AI model that can...

Netflix_feature_2020

To help localize subtitles from English to other languages, such as Russian, Spanish, or Portuguese, Netflix developed a proof-of-concept AI model that can automatically simplify and translate subtitles to multiple languages. The work is presented in a paper, Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine Translation, published this month on the preprint platform��

]]> 0 Nefi Alarcon <![CDATA[New AI Technologies Introduced at GTC 2020 Keynote]]> https://news.www.open-lab.net/?p=16960 2023-12-29T22:22:58Z 2020-05-14T13:00:00Z

At GTC 2020, NVIDIA announced and shipped a range of new AI SDKs, enabling developers to support the new Ampere architecture. For the first time, developers...]]>

At GTC 2020, NVIDIA announced and shipped a range of new AI SDKs, enabling developers to support the new Ampere architecture. For the first time, developers...

conversational_aI_2020_Feature

At GTC 2020, NVIDIA announced and shipped a range of new AI SDKs, enabling developers to support the new Ampere architecture. For the first time, developers have the tools to build end-to-end deep learning-based pipelines for conversational AI and recommendation systems. Today, NVIDIA announced Riva, a fully accelerated application framework building multimodal conversational AI services.

]]> 0 Nefi Alarcon <![CDATA[NVIDIA Research Unveils Flowtron, an Expressive and Natural Speech Synthesis Model]]> https://news.www.open-lab.net/?p=16979 2023-11-03T07:15:18Z 2020-05-14T13:00:00Z

Many of today��s speech synthesis models lack emotion and human-like expression. To help tackle this problem, a team of researchers from the NVIDIA Applied...]]>

Many of today��s speech synthesis models lack emotion and human-like expression. To help tackle this problem, a team of researchers from the NVIDIA Applied...

flowtron_Feature

Many of today��s speech synthesis models lack emotion and human-like expression. To help tackle this problem, a team of researchers from the NVIDIA Applied Deep Learning Research group developed a state-of-the-art model that generates more realistic expressions and provides better user control than previously published models. Named ��Flowtron��, the model debuted publicly for the first time as��

]]> 0 Nefi Alarcon <![CDATA[OpenAI��s Jukebox Produces Music with Lyrics from Scratch]]> https://news.www.open-lab.net/?p=16850 2023-11-03T07:15:18Z 2020-05-01T20:36:14Z

This week, OpenAI released Jukebox, a neural network that generates music with rudimentary singing, in a variety of genres and artist styles.? ��Provided with...]]>

This week, OpenAI released Jukebox, a neural network that generates music with rudimentary singing, in a variety of genres and artist styles.? ��Provided with...

OpenAI_music_feature

This week, OpenAI released Jukebox, a neural network that generates music with rudimentary singing, in a variety of genres and artist styles. ��Provided with genre, artist, and lyrics as input, Jukebox outputs a new music sample produced from scratch,�� the company stated in their post, Jukebox. Generating CD-quality music is a challenging problem to solve, as a typical song has over 10��

]]> 0 Nefi Alarcon <![CDATA[Amazon Trains Alexa on GPUs to Better Handle Complex Queries]]> https://news.www.open-lab.net/?p=16218 2023-11-03T07:15:19Z 2020-03-03T23:03:31Z

To improve how natural language processing (NLP) systems such as Alexa handle complex requests, Amazon researchers, in collaboration with the University of...]]>

To improve how natural language processing (NLP) systems such as Alexa handle complex requests, Amazon researchers, in collaboration with the University of...

Alexa2_feature

To improve how natural language processing (NLP) systems such as Alexa handle complex requests, Amazon researchers, in collaboration with the University of Massachusetts Amherst, developed a deep learning-based, sequence-to-sequence model that can better handle simple and complex queries. ��Virtual assistants such as Amazon Alexa, Apple Siri, and Google Assistant often rely on a semantic��

]]> 0 Nefi Alarcon <![CDATA[Nuance Accelerates Conversational AI Training by 50%]]> https://news.www.open-lab.net/?p=15729 2023-11-03T07:15:19Z 2020-01-17T18:53:51Z

The article below is a guest post by Nuance, a company focused on conversational AI. In this post, Nuance engineers describe their use of NVIDIA's automatic...]]>

The article below is a guest post by Nuance, a company focused on conversational AI. In this post, Nuance engineers describe their use of NVIDIA's automatic...

nuance_feature

The article below is a guest post by Nuance, a company focused on conversational AI. In this post, Nuance engineers describe their use of NVIDIA��s automatic mixed precision to speed up their AI models in the healthcare industry. By Wenxuan Teng, Ralf Leibold, and Gagandeep Singh Nuance��s ambient clinical intelligence (ACI) technology is an example of how it is accelerating development of��

]]> 0 Nefi Alarcon <![CDATA[NVIDIA Announces TensorRT 6; Breaks 10 millisecond barrier for BERT-Large]]> https://news.www.open-lab.net/?p=14901 2023-11-03T07:15:20Z 2019-09-16T21:00:52Z

Today, NVIDIA released TensorRT 6 which includes new capabilities that dramatically accelerate conversational AI applications, speech recognition, 3D image...]]>

Today, NVIDIA released TensorRT 6 which includes new capabilities that dramatically accelerate conversational AI applications, speech recognition, 3D image...

TENSORRT5

Today, NVIDIA released TensorRT 6 which includes new capabilities that dramatically accelerate conversational AI applications, speech recognition, 3D image segmentation for medical applications, as well as image-based applications in industrial automation. TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for AI��

]]> 0 Nefi Alarcon <![CDATA[Google Develops ASR System To Help People with Speech Impairments]]> https://news.www.open-lab.net/?p=14800 2023-11-03T07:15:20Z 2019-08-29T17:14:26Z

To help people with speech impairments better interact with every-day smart devices, Google researchers have developed a deep learning-based automatic speech...]]>

To help people with speech impairments better interact with every-day smart devices, Google researchers have developed a deep learning-based automatic speech...

Google_1_Feature

To help people with speech impairments better interact with every-day smart devices, Google researchers have developed a deep learning-based automatic speech recognition (ASR) system that aims to improve communications for people with amyotrophic lateral sclerosis (ALS), a disease that can affect a person��s speech. The research, part of Project Euphoria, is an ASR platform that performs speech��

]]> 0 Nefi Alarcon <![CDATA[NVIDIA Slashes BERT Training and Inference Times]]> https://news.www.open-lab.net/?p=14659 2023-11-03T07:15:21Z 2019-08-13T22:08:57Z

NVIDIA announced breakthroughs today in language understanding that give developers the opportunity to more naturally develop conversational AI applications...]]>

NVIDIA announced breakthroughs today in language understanding that give developers the opportunity to more naturally develop conversational AI applications...

Bert_Records

NVIDIA announced breakthroughs today in language understanding that give developers the opportunity to more naturally develop conversational AI applications using BERT and real-time inference tools, such as TensorRT to dramatically speed up their AI speech applications. In today��s announcement, researchers and developers from NVIDIA set records in both training and inference of BERT��

]]> 0 Nefi Alarcon <![CDATA[Top 5 AI Speech Applications Using NVIDIA��s GPUs for Inference]]> https://news.www.open-lab.net/?p=14333 2023-11-03T07:15:21Z 2019-07-01T18:17:07Z

AI-enabled services such as speech recognition and natural language processing are increasing in demand. To help developers manage growing datasets, latency...]]>

AI-enabled services such as speech recognition and natural language processing are increasing in demand. To help developers manage growing datasets, latency...

Wave

AI-enabled services such as speech recognition and natural language processing are increasing in demand. To help developers manage growing datasets, latency requirements, customer requirements, and more complex neural networks, we are highlighting a few AI speech applications that rely on NVIDIA��s inference platform to solve common AI speech challenges. From Amazon��s Alexa Research group��

]]> 0 Nefi Alarcon <![CDATA[New GAN Can Lipread and Synthesize Speech]]> https://news.www.open-lab.net/?p=14320 2023-11-03T07:15:22Z 2019-06-28T16:02:33Z

Current audio speech recognition models normally do not perform well in noisy environments. To help solve the problem, researchers from Samsung and Imperial...]]>

Current audio speech recognition models normally do not perform well in noisy environments. To help solve the problem, researchers from Samsung and Imperial...

Wave_Form_1

Current audio speech recognition models normally do not perform well in noisy environments. To help solve the problem, researchers from Samsung and Imperial College in London developed a deep learning solution that uses computer vision for visual speech recognition. The model is capable of lipreading, as well as synthesizing audio it sees from the video. Lipreading is primarily used by��

]]> 0 Nefi Alarcon <![CDATA[AI Model Can Generate Images from Natural Language Descriptions]]> https://news.www.open-lab.net/?p=14296 2023-11-03T07:15:22Z 2019-06-26T16:39:01Z

To potentially improve natural language queries, including the retrieval of images from speech, Researchers from IBM and the University of Virginia developed a...]]>

To potentially improve natural language queries, including the retrieval of images from speech, Researchers from IBM and the University of Virginia developed a...

text_2_scene

To potentially improve natural language queries, including the retrieval of images from speech, Researchers from IBM and the University of Virginia developed a deep learning model that can generate objects and their attributes from natural language descriptions. Unlike other recent methods, this approach does not use GANs. ��We show that under minor modifications, the proposed framework can��

]]> 0 Nefi Alarcon <![CDATA[Amazon Improves Speech Emotion Detection with Adversarial Training Using NVIDIA GPUs]]> https://news.www.open-lab.net/?p=14079 2023-11-03T07:15:23Z 2019-05-31T16:29:08Z

Developers from Amazon��s Alexa Research group have just published a?developer blog?and published a?paper?describing how they are using adversarial...]]>

Developers from Amazon��s Alexa Research group have just published a?developer blog?and published a?paper?describing how they are using adversarial...

amazon_emotion_feature

Developers from Amazon��s Alexa Research group have just published a developer blog and published a paper describing how they are using adversarial training to recognize and improve emotion detection. ��A person��s tone of voice can tell you a lot about how they��re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic,�� said Viktor Rozgic��

]]> 0 Nefi Alarcon <![CDATA[Inception Spotlight: DeepZen Uses AI to Generate Speech for Audiobooks]]> https://news.www.open-lab.net/?p=14046 2023-11-03T07:15:23Z 2019-05-29T19:39:33Z

Almost 1,000,000 books are published every year in the United States, however, only around 40,000 of them are converted into audiobooks, primarily due to costs...]]>

Almost 1,000,000 books are published every year in the United States, however, only around 40,000 of them are converted into audiobooks, primarily due to costs...

DeepZen_Feature_1

Almost 1,000,000 books are published every year in the United States, however, only around 40,000 of them are converted into audiobooks, primarily due to costs and production time. To help with the process, DeepZen, a London-based company, and a member of the Inception program, NVIDIA��s start-up incubator, developed a deep learning-based system that can generate complete audio recordings of��

]]> 0 Nefi Alarcon <![CDATA[Microsoft Leverages the Power of NVIDIA GPUs to Enhance Speech Recognition Algorithms]]> https://news.www.open-lab.net/?p=13979 2023-11-03T07:15:24Z 2019-05-22T20:29:44Z

To enhance the capability of text-to-speech and automatic speech recognition algorithms, Microsoft researchers developed a deep learning model that uses...]]>

To enhance the capability of text-to-speech and automatic speech recognition algorithms, Microsoft researchers developed a deep learning model that uses...

audio-2028515_960_720

To enhance the capability of text-to-speech and automatic speech recognition algorithms, Microsoft researchers developed a deep learning model that uses unsupervised learning, an approach not commonly used in this field, to improve the accuracy of the two speech tasks. By using the Transformer model, which is based on a sequence-to-sequence architecture, the team achieved a 99.84%

]]> 0 Nefi Alarcon <![CDATA[Microsoft Announces New Breakthroughs in AI Speech Tasks]]> https://news.www.open-lab.net/?p=13932 2023-11-03T07:15:24Z 2019-05-17T18:33:04Z

Microsoft AI Research just announced a new breakthrough in the field of conversational AI that achieves new records in seven of nine natural language...]]>

Microsoft AI Research just announced a new breakthrough in the field of conversational AI that achieves new records in seven of nine natural language...

Microsoft_1_Feature

Microsoft AI Research just announced a new breakthrough in the field of conversational AI that achieves new records in seven of nine natural language processing tasks from the General Language Understanding Evaluation (GLUE) benchmark. Microsoft��s natural language processing algorithm called Multi-Task DNN, first released in January and updated this month, incorporates Google��s BERT NLP model��

]]> 0 Nefi Alarcon <![CDATA[Experimental AI Powered Hearing Aid Automatically Amplifies Who You Want to Hear]]> https://news.www.open-lab.net/?p=13926 2023-11-03T07:15:25Z 2019-05-17T17:18:20Z

To help people who suffer from hearing loss, Researchers from Columbia University just developed a deep learning-based system that can help amplify specific...]]>

To help people who suffer from hearing loss, Researchers from Columbia University just developed a deep learning-based system that can help amplify specific...

Columbia_Feature

To help people who suffer from hearing loss, Researchers from Columbia University just developed a deep learning-based system that can help amplify specific speakers in a group, a breakthrough that could lead to better hearing aids. ��The brain area that processes sound is extraordinarily sensitive and powerful; it can amplify one voice over others, seemingly effortlessly��

]]> 0 Nefi Alarcon <![CDATA[NVIDIA��s Top 5 AI Stories of the Week: 4/22]]> https://news.www.open-lab.net/?p=13805 2023-11-03T07:15:25Z 2019-04-26T15:24:59Z

Every week we highlight NVIDIA��s Top 5 AI stories of the week. In this week��s edition we cover a new deep learning-based algorithm from OpenAI that can...]]>

Every week we highlight NVIDIA��s Top 5 AI stories of the week. In this week��s edition we cover a new deep learning-based algorithm from OpenAI that can...

1-5

Every week we highlight NVIDIA��s Top 5 AI stories of the week. In this week��s edition we cover a new deep learning-based algorithm from OpenAI that can automatically generate new music. Plus, an automatic speech recognition model that could improve Alexa��s algorithm by 15%. Watch below: Planning a workout that is specific to a user��s needs can be challenging.

]]> 0 Nefi Alarcon <![CDATA[OpenAI Releases MuseNet: AI Algorithm Automatically Generates Music]]> https://news.www.open-lab.net/?p=13797 2023-11-03T07:15:26Z 2019-04-26T14:59:26Z

Trying to generate music like Mozart, Beethoven, or perhaps Lady Gaga? AI research organization OpenAI just released a demo of a new deep learning algorithm...]]>

Trying to generate music like Mozart, Beethoven, or perhaps Lady Gaga? AI research organization OpenAI just released a demo of a new deep learning algorithm...

OpenAI

Trying to generate music like Mozart, Beethoven, or perhaps Lady Gaga? AI research organization OpenAI just released a demo of a new deep learning algorithm that can automatically generate original music using many different instruments and styles. ��We��ve created Musenet, a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles��

]]> 0 Nefi Alarcon <![CDATA[AI Research Could Help Improve Alexa��s Speech Recognition Model by 15%]]> https://news.www.open-lab.net/?p=13782 2024-08-05T21:04:48Z 2019-04-25T19:34:29Z

Researchers from Johns Hopkins University and Amazon published a new paper describing how they trained a deep learning system that can help Alexa ignore speech...]]>

Researchers from Johns Hopkins University and Amazon published a new paper describing how they trained a deep learning system that can help Alexa ignore speech...

Alexa_Feature

Researchers from Johns Hopkins University and Amazon published a new paper describing how they trained a deep learning system that can help Alexa ignore speech not intended for her, improving the speech recognition model by 15%. ��Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device directed speech in the presence of��

]]> 0 Nefi Alarcon <![CDATA[Gridspace Presents an AI-Based Call Center Agent]]> https://news.www.open-lab.net/?p=13488 2023-11-03T07:15:27Z 2019-04-11T22:11:20Z

Gridspace, a southern California-based company, recently presented an end-to-end deep learning based-solution that can allow businesses to automate the call...]]>

Gridspace, a southern California-based company, recently presented an end-to-end deep learning based-solution that can allow businesses to automate the call...

Gridspace_Feature

Gridspace, a southern California-based company, recently presented an end-to-end deep learning based-solution that can allow businesses to automate the call center process by using NVIDIA GPUs on the cloud. In an example shown at GTC Silicon Valley, the company presented a video of an AI generated voice interacting with a customer as if it were a real person. The tool has the potential to allow��

]]> 0 Nefi Alarcon <![CDATA[Microsoft Advancing AI-Powered Cloud Speech Using GPU Inference]]> https://news.www.open-lab.net/?p=13179 2023-11-03T07:15:27Z 2019-03-18T22:25:29Z

Delivering speech-driven recommendations in real time with NVIDIA GPU Inference When you ask your phone a question, you don��t just want the right answer. You...]]>

Delivering speech-driven recommendations in real time with NVIDIA GPU Inference When you ask your phone a question, you don��t just want the right answer. You...

Microsoft

When you ask your phone a question, you don��t just want the right answer. You want the right answer, right now. The answer to this seemingly simple question requires an AI-powered service that involves multiple neural networks that have to perform a variety of predictions and get an answer back to you in under one second so it feels instantaneous. These include: All of these AI��

]]> 0 Nefi Alarcon <![CDATA[Top 5 AI Stories of the Week: 3/1]]> https://news.www.open-lab.net/?p=12923 2023-11-03T07:15:28Z 2019-03-01T18:40:31Z

From an AI algorithm that can predict earthquakes to a system that can decode rodent chatter - here are the top 5 AI stories of the week. 5 - Deep Learning...]]>

From an AI algorithm that can predict earthquakes to a system that can decode rodent chatter - here are the top 5 AI stories of the week. 5 - Deep Learning...

Number_5

From an AI algorithm that can predict earthquakes to a system that can decode rodent chatter �C here are the top 5 AI stories of the week. Most people can��t detect an earthquake until the ground under their feet is already shaking or sliding, leaving little time to prepare or take shelter. Scientists are trying to short circuit that surprise using the critical time window during the��

]]> 0 Nefi Alarcon <![CDATA[AI Helps Protect Endangered Elephants]]> https://news.www.open-lab.net/?p=11387 2023-11-03T07:15:28Z 2018-09-10T17:18:13Z

According to the U.N., up to 100 elephants are slaughtered every day in Africa by poachers taking part in the illegal ivory trade. This amounts to around 35,000...]]>

According to the U.N., up to 100 elephants are slaughtered every day in Africa by poachers taking part in the illegal ivory trade. This amounts to around 35,000...

Elephants_Feature

According to the U.N., up to 100 elephants are slaughtered every day in Africa by poachers taking part in the illegal ivory trade. This amounts to around 35,000 elephants killed each year due to poaching.To help fight the problem, Conservation Metrics, a Santa Cruz, California-based startup, is using deep learning to help detect the sounds of elephants, as well as gunfire, and get a more detailed��

]]> 0 Nefi Alarcon <![CDATA[AI Can Help Anyone Become a Beatbox Champion]]> https://news.www.open-lab.net/?p=11046 2023-11-03T07:15:29Z 2018-08-10T20:22:36Z

To help up-and-coming musicians create the best beats for their song, developers from a Japanese-based AI startup developed a deep learning system called Neural...]]>

To help up-and-coming musicians create the best beats for their song, developers from a Japanese-based AI startup developed a deep learning system called Neural...

Qosmo

To help up-and-coming musicians create the best beats for their song, developers from a Japanese-based AI startup developed a deep learning system called Neural Beatboxer that can convert everyday sounds into hours of automatically compiled rhythms. Users can visit their website, feed it some sounds, and the neural network automatically produces a custom drum kit that can go on for hours.

]]> 0 Nefi Alarcon <![CDATA[AI Can Play It By Ear]]> https://news.www.open-lab.net/?p=10129 2023-11-03T07:15:29Z 2018-05-22T18:21:40Z

Researchers from Facebook developed a deep learning system that can replicate the music it hears and play it back as if it were Mozart, Beethoven, or Bach. This...]]>

Researchers from Facebook developed a deep learning system that can replicate the music it hears and play it back as if it were Mozart, Beethoven, or Bach. This...

Facebook_AI_Cover

Researchers from Facebook developed a deep learning system that can replicate the music it hears and play it back as if it were Mozart, Beethoven, or Bach. This is the first time researchers have produced high fidelity musical translation between instruments, styles, and genres. ��Humans have always created music and replicated it �C whether it is by singing, whistling, clapping, or��

]]> 0 Nefi Alarcon <![CDATA[Startup Builds AI System To Transcribe Meetings]]> https://news.www.open-lab.net/?p=9531 2023-11-03T07:15:30Z 2018-03-15T00:01:14Z

Voicea, a San Francisco Bay Area startup, recently announced $20 million funding for their GPU-based deep learning system that can now fully transcribe meetings...]]>

Voicea, a San Francisco Bay Area startup, recently announced $20 million funding for their GPU-based deep learning system that can now fully transcribe meetings...

Voicera Feature Image

Voicea, a San Francisco Bay Area startup, recently announced $20 million funding for their GPU-based deep learning system that can now fully transcribe meetings and put together highlights. The system was designed to help teams better collaborate in an enterprise environment. Eva, the start-ups AI assistant, joins meetings and conference calls through a combination of machine learning��

]]> 0 Brad Nemire <![CDATA[Microsoft Sets New Speech Recognition Record]]> https://news.www.open-lab.net/?p=8983 2023-11-03T07:15:30Z 2017-08-21T17:14:45Z

Researchers at Microsoft announced they reached a 5.1% error rate which is a new milestone in reaching human parity for recognizing words in a conversation as...]]>

Researchers at Microsoft announced they reached a 5.1% error rate which is a new milestone in reaching human parity for recognizing words in a conversation as...

MSFT Switchboard Fetured

Researchers at Microsoft announced they reached a 5.1% error rate which is a new milestone in reaching human parity for recognizing words in a conversation as well as professional human transcribers. They improved the accuracy of their system from last year on the Switchboard conversational speech recognition task. The benchmarking task is a corpus of recorded telephone conversations that the��

]]> 0 Brad Nemire <![CDATA[GPU Technology Conference 2017: Call for Submissions Now Open]]> https://news.www.open-lab.net/?p=7678 2023-12-29T22:52:55Z 2016-08-24T23:23:37Z

Take part in the world��s top GPU developer event May 8 -11, 2017 in Silicon Valley where artificial intelligence, virtual reality and autonomous vehicles will...]]>

Take part in the world��s top GPU developer event May 8 -11, 2017 in Silicon Valley where artificial intelligence, virtual reality and autonomous vehicles will...

GTC submissions open

Take part in the world��s top GPU developer event May 8 -11, 2017 in Silicon Valley where artificial intelligence, virtual reality and autonomous vehicles will take center stage. GTC 2017 provides developers and thought leaders with the opportunity to share their work with thousands of the world��s brightest minds. The 2016 event had more than 5,500 attendees, and 600+ sessions on GPU��

]]> 0 Brad Nemire <![CDATA[Share Your Science: Accelerating Microsoft Cortana and Skype Translator]]> http://news.www.open-lab.net/?p=7052 2023-11-03T07:15:32Z 2016-02-12T19:13:58Z

Alexey Kamenev, Software Engineer at Microsoft Research talks about their open-source Computational Network Toolkit (CNTK) for deep learning, which describes...]]>

Alexey Kamenev, Software Engineer at Microsoft Research talks about their open-source Computational Network Toolkit (CNTK) for deep learning, which describes...

Microsoft Cortana

Alexey Kamenev, Software Engineer at Microsoft Research talks about their open-source Computational Network Toolkit (CNTK) for deep learning, which describes neural networks as a series of computational steps via a directed graph. Kamenev also shares a bit about how they��re using GPUs, the CUDA Toolkit and GPU-accelerated libraries for the variety of Microsoft products that benefit from deep��

]]> 0 Brad Nemire <![CDATA[New GPU Computing Model for Artificial Intelligence]]> http://news.www.open-lab.net/?p=6931 2023-11-03T07:15:32Z 2016-01-12T21:26:47Z

Yann LeCun, Director of Facebook AI Research, invited NVIDIA CEO Jen-Hsun Huang to speak at ��The Future of AI�� symposium at NYU, where industry leaders...]]>

Yann LeCun, Director of Facebook AI Research, invited NVIDIA CEO Jen-Hsun Huang to speak at ��The Future of AI�� symposium at NYU, where industry leaders...

AI GPU Computing

Yann LeCun, Director of Facebook AI Research, invited NVIDIA CEO Jen-Hsun Huang to speak at ��The Future of AI�� symposium at NYU, where industry leaders discussed the state of AI and its continued advancement. Jen-Hsun published a blog on his talk that coverstopics such as how deep learning is a new software model that needs a new computing model; why AI researchers have adopted GPU-accelerated��

]]> 0 Brad Nemire <![CDATA[GPU-Trained System Understands Movies]]> http://news.www.open-lab.net/?p=6880 2023-11-03T07:15:33Z 2015-12-25T22:34:41Z

Researchers from Karlsruhe Institute of Tech, MIT and University of Toronto published MovieQA, a dataset that contains 7702 reasoning questions and answers from...]]>

Researchers from Karlsruhe Institute of Tech, MIT and University of Toronto published MovieQA, a dataset that contains 7702 reasoning questions and answers from...

MovieQA Featured

Researchers from Karlsruhe Institute of Tech, MIT and University of Toronto published MovieQA, a dataset that contains 7702 reasoning questions and answers from 294 movies. Their innovative dataset and accuracy metrics provide a well-defined challenge for question/answer machine learning algorithms. The questions range from simpler ��Who�� did ��What�� to ��Whom�� that can be solved by computer vision��

]]> 0 Brad Nemire <![CDATA[How GPUs are Revolutionizing Machine Learning]]> http://news.www.open-lab.net/?p=6783 2023-11-03T07:15:33Z 2015-12-10T20:07:12Z

NVIDIA announced that Facebook will accelerate its next-generation computing system with the NVIDIA Tesla Accelerated Computing Platform which will enable them...]]>

NVIDIA announced that Facebook will accelerate its next-generation computing system with the NVIDIA Tesla Accelerated Computing Platform which will enable them...

Rise of GPUs

NVIDIA announced that Facebook will accelerate its next-generation computing system with the NVIDIA Tesla Accelerated Computing Platform which will enable them to drive a broad range of machine learning applications. Facebook is the first company to train deep neural networks on the new Tesla M40 GPUs �C introduced last month �C this will play a large role in their new open source ��Big Sur��

]]> 0 Brad Nemire <![CDATA[NVIDIA to Benefit from Shift to GPU-powered Deep Learning]]> http://news.www.open-lab.net/?p=6663 2023-11-03T07:15:34Z 2015-11-10T22:59:12Z

Wired?discusses Google��s announcement that it is open sourcing its TensorFlow machine learning system - noting the system uses GPUs to both train and run...]]>

Wired?discusses Google��s announcement that it is open sourcing its TensorFlow machine learning system - noting the system uses GPUs to both train and run...

Web

Wired discusses Google��s announcement that it is open sourcing its TensorFlow machine learning system �C noting the system uses GPUs to both train and run artificial intelligence services at the company. Inside Google, when tackling tasks like image recognition and speech recognition and language translation, TensorFlow depends on machines equipped with GPUs that were originally designed to render��

]]> 0 Brad Nemire <![CDATA[Researchers Using GPUs to Monitor Underage Drinking on Instagram]]> http://news.www.open-lab.net/?p=6563 2023-11-03T07:15:34Z 2015-11-04T22:14:21Z

Instagram could offer a novel way of monitoring the drinking habits of teenagers. Using photos and text from Instagram, a team of researchers from the...]]>

Instagram could offer a novel way of monitoring the drinking habits of teenagers. Using photos and text from Instagram, a team of researchers from the...

Monitoring Instagram for Underage Drinking

Instagram could offer a novel way of monitoring the drinking habits of teenagers. Using photos and text from Instagram, a team of researchers from the University of Rochester has shown that this data can not only expose patterns of underage drinking more cheaply and faster than conventional surveys, but also find new patterns, such as what alcohol brands or types are favored by different��

]]> 0 Brad Nemire <![CDATA[GPUs are the Unlikely Secret Making Cars Much Smarter]]> http://news.www.open-lab.net/?p=6513 2023-11-03T07:15:35Z 2015-10-22T18:57:42Z

In a recent interview with TIME, NVIDIA��s senior director of automotive Danny Shapiro shares how the company's innovations in gaming graphics are well-suited...]]>

In a recent interview with TIME, NVIDIA��s senior director of automotive Danny Shapiro shares how the company's innovations in gaming graphics are well-suited...

GPUs are the Unlikely Secret Making Cars Much Smarter

In a recent interview with TIME, NVIDIA��s senior director of automotive Danny Shapiro shares how the company��s innovations in gaming graphics are well-suited to the needs of autonomous vehicles. Driverless cars, which take passengers from A to B with minimal human input, are already hitting American roads. A variety of automakers and technology firms are experimenting with driverless technology��

]]> 0 Brad Nemire <![CDATA[Silicon Valley Meetup: Baidu Researchers to talk on ��What��s New in Deep Learning��]]> http://news.www.open-lab.net/?p=6334 2023-11-03T07:15:35Z 2015-09-23T18:03:13Z

The HPC and GPU Supercomputing Group of Silicon Valley will be hosting two researchers from Baidu on Tuesday, October 6, 2015 from 6:30PM to 9:30PM at the...]]>

The HPC and GPU Supercomputing Group of Silicon Valley will be hosting two researchers from Baidu on Tuesday, October 6, 2015 from 6:30PM to 9:30PM at the...

The HPC and GPU Supercomputing Group of Silicon Valley will be hosting two researchers from Baidu on Tuesday, October 6, 2015 from 6:30PM to 9:30PM at the NVIDIA Headquarters in Santa Clara, Ca. We are very excited to have Awni Hannun and Erich Elsen from Baidu Research join us. Awni, former Stanford researcher, has extensive experience in solving hard speech recognition tasks using deep��

]]> 0 Brad Nemire <![CDATA[Deep Learning Experts Named to MIT��s ��Innovators Under 35�� List]]> http://news.www.open-lab.net/?p=6058 2023-11-03T07:15:36Z 2015-08-20T20:02:43Z

Two winners in the Visionary category are harnessing the computing power of NVIDIA GPUs to drive their artificial intelligence applications. MIT Technology...]]>

Two winners in the Visionary category are harnessing the computing power of NVIDIA GPUs to drive their artificial intelligence applications. MIT Technology...

MIT_Fig3

Two winners in the Visionary category are harnessing the computing power of NVIDIA GPUs to drive their artificial intelligence applications. MIT Technology Review recently revealed its annual ��35 Innovators Under 35,�� which lists young technologists usingtoday��s emerging technologies to transform tomorrow��s world. Ilya Sutskever, 29, is a key member of the Google Brain research team��

]]> 0 Brad Nemire <![CDATA[Deep Speech by Baidu Now Recognizes Mandarin]]> http://news.www.open-lab.net/?p=6033 2023-11-03T07:15:36Z 2015-08-11T22:52:47Z

Chinese search giant Baidu recently presented a new GPU-based Deep Speech deep learning system that has 94% accuracy when handling voice queries in Mandarin....]]>

Chinese search giant Baidu recently presented a new GPU-based Deep Speech deep learning system that has 94% accuracy when handling voice queries in Mandarin....

Baidu Mandarin Speech

Chinese search giant Baidu recently presented a new GPU-based Deep Speech deep learning system that has 94% accuracy when handling voice queries in Mandarin. Originally unveiled in December 2014, the speech recognition system was only able to recognize the English language. Baidu senior research engineer Awni Hannun was interviewed by Medium to share why Mandarin is such a tough��

]]> 0 ��˳��97caoporen��