Google Cloud text-to-speech engine for developers was overhauled last month, now the company has announced an update to the speech-to-text service to improve the voice recognition performance.

The updated Cloud Speech-to-Text API promises significant improvement in word errors reduction, and down to around 54 percent across all Google’s testings, which in some cases the results were actually better than the above figure.

Now, the API built on the core speech recognition technology in which both its Search and Assistant service depends on is better optimized. As more natural voices are available to the service through Google DeepMind WaveNet models, enabling developers to select between different machine learning models based on use case.

The API currently offers four models: First model is for short queries and voice commands, the second is for understanding audio from phone calls and third for handling audio from videos. While the fourth model is recommended for all other scenarios by default.

It replaces the automatic model selection, as this new tailoring was achieved after customers requested that Google should utilize real data to train the model.

Google will be bringing a beta feature that automatically punctuate long-form speech transcription to suggests commas, questions marks, and periods. And, the company will allow developers to tag transcribed audio or video in order to tell Google what models Speech-to-Text to prioritize.

The company says that it will use the aggregate data from all of its users to decide on which new features to build on next.

Improvements coming to Google Cloud Speech-to-text service



Google Cloud text-to-speech engine for developers was overhauled last month, now the company has announced an update to the speech-to-text service to improve the voice recognition performance.

The updated Cloud Speech-to-Text API promises significant improvement in word errors reduction, and down to around 54 percent across all Google’s testings, which in some cases the results were actually better than the above figure.

Now, the API built on the core speech recognition technology in which both its Search and Assistant service depends on is better optimized. As more natural voices are available to the service through Google DeepMind WaveNet models, enabling developers to select between different machine learning models based on use case.

The API currently offers four models: First model is for short queries and voice commands, the second is for understanding audio from phone calls and third for handling audio from videos. While the fourth model is recommended for all other scenarios by default.

It replaces the automatic model selection, as this new tailoring was achieved after customers requested that Google should utilize real data to train the model.

Google will be bringing a beta feature that automatically punctuate long-form speech transcription to suggests commas, questions marks, and periods. And, the company will allow developers to tag transcribed audio or video in order to tell Google what models Speech-to-Text to prioritize.

The company says that it will use the aggregate data from all of its users to decide on which new features to build on next.

No comments