Phonetic Data

Phonetic data is what links words to their pronunciations -- a key function in any speech application. Most speech applications apply phonetic data from various sources. It exists as manually crafted pronunciation dictionaries which ship with the speech engine. This is the case both for ASR and TTS.

Since those dictionaries are limited in coverage, speech engines also come with a mechanism for automatic generation of phonetic data. This is known as Grapheme-to-Phoneme (G2P) or Letter-to-Sound (LTS) functionality. These tools can generate pronunciations for words that are not supported by the dictionary. They are either rule-based, i.e. relying on phonetic expertise, or data-driven, i.e. trained on a large amount of existing phonetic data. In addition to the engine dictionary and the G2P engine, most speech engines also support user-defined pronunciations.

This is crucial since we know from experience that the G2P generates incorrect pronunciations for a considerable number of common words. Speech applications that support dynamically changing data which cannot be accounted for during the development of the application are therefore particularly vulnerable to the limitations of the G2P. In order to ensure a good user experience both in terms of speech recognition accuracy and intelligible speech synthesis, it is important to have a solution in place which addresses this challenge with high quality, easily accessible phonetic data.

Phonetic Labs provides this solution.

Phonetic Data in the cloud - always available, always up-to-date