Google’s WaveNet Can Mimic Human Speech

Google’s DeepMind has developed an artificial intelligence called WaveNet that can generate human speech by learning to form individual sound waves. Speech synthesis or text-to-speech (TTS) are the most commonly found types of speech generated, using short speech fragments from a single speaker then combining the fragments into new sentences. But these short fragments make the speech extremely difficult to modify or alter.

This isn’t an issue for WaveNet. It uses parametric TTS, where all the information needed to generate data has been stored in the parameters of the model, according to DeepMind. This type of AI, called a neural network, mimics parts of human brain function and models raw wave forms of audio signal one sample at a time. Bloomberg reported that in blind tests in English and Chinese Mandarin, listeners found that WaveNet’s speech sounded more natural than Google’s existing TTS programs.

