Generate speech from text
July 4, 2025
Table of contents
This endpoint generates speech from text using text-to-speech technology.
Notes
When generating a very long text, some models tend to generate speech that gets gradually quieter. To avoid that, we suggest slicing the prompt into 1K chunks.
https://api.useapi.net/v1/heygen/tts/create
Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
API token
is required, see Setup useapi.net for details.
Request Body
{
"email": "[email protected]",
"voice_id": "en-US-AriaNeural",
"prompt": "Text to be converted to speech",
"speed": 100,
"pitch": 0,
"volume": 100,
"language_code": "en-US",
"emotion": "happy"
}
-
email
is optional when only one account configured.
However, if you have multiple accounts configured, this parameter becomes required. -
voice_id
is required, a valid voice_id from GET /tts/voices. -
prompt
is required, the text to be converted to speech (maximum 5000 characters). -
speed
is optional, range from50
to150
.
Default is100
(normal speed). -
pitch
is optional, range from-100
to100
.
Default is0
(normal pitch). -
volume
is optional, range from0
to100
.
Default is100
(full volume). -
language_code
is optional, must be one of the supported language codes from GET /tts/languages. -
emotion
is optional, must be one of the supported emotion names for the selected voice.
Use validvoice.settings.clone_emotions.name
from GET /tts/voices/?voice_id=voice_id
.
Responses
-
{ "audio_url": "https://heygen-media.s3.amazonaws.com/audio/abc123.mp3", "duration": 5.2, "is_pass": true, "job_id": null, "word_timestamps": [ { "word": "Hello", "start": 0.0, "end": 0.5 }, { "word": "world", "start": 0.6, "end": 1.1 } ] }
-
{ "error": "Invalid emotion (angry), supported values: happy, sad, neutral" }
-
{ "error": "Unauthorized", "code": 401 }
Field audio_url
will contain URL with generated mp3 audio file.
Model
{ // TypeScript, all fields are optional
audio_url: string // URL to the generated MP3 audio file
duration: number // Duration of the audio in seconds
is_pass: boolean // Whether the generation was successful
job_id: string | null // Job ID (usually null for synchronous requests)
word_timestamps: { // Timing information for each word
word: string // The spoken word
start: number // Start time in seconds
end: number // End time in seconds
}[]
}
Examples
-
curl -X POST "https://api.useapi.net/v1/heygen/tts/create" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer …" \ -d '{"email":"[email protected]","voice_id":"en-US-AriaNeural","prompt":"Hello, world!"}'
-
const token = "API token"; const email = "Previously configured account email"; const apiUrl = "https://api.useapi.net/v1/heygen/tts/create"; const response = await fetch(apiUrl, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${token}`, }, body: JSON.stringify({ email: email, voice_id: "en-US-AriaNeural", prompt: "Hello, world!" }) }); const result = await response.json(); console.log("response", {response, result});
-
import requests token = "API token" email = "Previously configured account email" apiUrl = "https://api.useapi.net/v1/heygen/tts/create" headers = { "Content-Type": "application/json", "Authorization" : f"Bearer {token}" } data = { "email": email, "voice_id": "en-US-AriaNeural", "prompt": "Hello, world!" } response = requests.post(apiUrl, headers=headers, json=data) print(response, response.json())