Using Google's Cloud Text-to-Speech API
- What is this API?
Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 100+ voices, available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible. As an easy-to-use API, you can create lifelike interactions with your users, across many applications and devices.
- Does the API cost money to use?
- Please reference this link
- What voices can I choose from?
- Please reference this link
- Can I use Speech Synthesis Markup Language (SSML)?
- Yes, please reference this link and see the example code below
<speak>
Here are <say-as interpret-as="characters">SSML</say-as> samples.
I can pause <break time="3s"/>.
I can play a sound
<audio src="https://www.example.com/MY_MP3_FILE.mp3">didn't get your MP3 audio file</audio>.
I can speak in cardinals. Your number is <say-as interpret-as="cardinal">10</say-as>.
Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line.
Or I can even speak in digits. The digits for ten are <say-as interpret-as="characters">10</say-as>.
I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>.
Finally, I can speak a paragraph with two sentences.
<p><s>This is sentence one.</s><s>This is sentence two.</s></p>
</speak>- Should I use a standard voice or a WaveNet voice?
- I prefer to use a WaveNet voice. You can read about the differences here.
- Can I get a custom voice?
- Yes, there is a Custom Voice program (beta)