Getting started with the Speech API

What is the Speech API?

With Replica Studios's Speech API you can build apps with realistic sounding text-to-speech voices. It's also built for scale, so it doesn't matter if you're building a small indie game, or a large video editing software product, we can help give your project a quality voice.

We're currently helping studios produce dialog for games, SaaS products provide quick and easy text-to-speech features, and deep integrations into art projects.

We recommend having a look at the API Documentation for a guide to get setup.

Speech API Overview

Insanely fast synthesized text-to-speech

From the time we receive an API request, we'll have synthesized audio ready in less than 0.5 seconds. Taking into account your download time for the MP3 or WAV file, you can expect to receive the generated audio file ~2 seconds.

API that scales with your needs

The Speech API will scale with your app. So whether you're sending a few requests, or steady traffic through all hours, the most your app has to wait is a few seconds for the magic to happen.

This means you can build apps that deliver near real-time latency, at scale, while accessing the entire library of Replica's AI Voice Actors and your own custom voices.

SSML for fine control of the generated speech

In particular, you can now add pauses in between words, change the speaking rate or volume, and even modify the pitch for all AI Voice Actors.

Find out how to get started generating with SSML.

Have more questions?

Join the Replica Studios Discord: https://discord.gg/VphcThzA5z