Speech Synthesis

Text-to-speech (TTS) systems allow devices to read out text with natural-sounding synthesized speech. It is also known as speech synthesis. The TTS system has several key components which work together to create human-like, intelligible speech. Before the TTS system gets to the waveform creation phase -- the actual synthesis -- it goes through a number of text-based processing steps including grammatical parsing, text normalization and identification of the most appropriate pronunciation of the target string. A TTS application which supports dynamically changing content needs a mechanism for handling words it hasn't seen before. This typically involves handing off the responsibility of the pronunciation to the G2P engine. Incorrect G2P output is materialized later in the TTS process as incorrect synthesized speech and it happens more often than it needs to. The impact to the user experience is immediate and at best the TTS application sounds unintentionally funny. However, it often means that the TTS output of specific words becomes unintelligible which essentially removes the value of the TTS function as a hands-free, eyes-free solution for communicating information to the user.

Phonetic Labs provides the solution.

Phonetic Data in the cloud - always available, always up-to-date