Create text-to-speech audio stream token and payload
Table of contents
December 23, 2024 (August 21, 2025)
This version of MiniMax audio has been decommissioned. Consider switching to Mureka API
Please configure at least one www.minimax.io/audio account for this endpoint, see Setup MiniMax for details.
This endpoint creates a near real-time audio stream from the provided text.
- Average time to response is 3 seconds.
- Up to 20 parallel jobs per account are supported.
Over 300 pre-built voices provided GET audio/voices supporting the following:
- Languages: English (US, UK, Australia, India), Chinese (Mandarin and Cantonese), Japanese, Korean, French, German, Spanish, Portuguese (including Brazilian), Italian, Arabic, Russian, Turkish, Dutch, Ukrainian, Vietnamese, and Indonesian.
 The list is constantly updated to include more languages!
- Emotions: happy, sad, angry, fearful, disgusted, surprised, neutral
- Accents: US (General), English, Indian
- Ages: Young Adult, Adult, Middle-Aged, Senior
- Genders: Male, Female
Returned by this endpoint token and payload will be used by WebSocket WSS audio/wss.
https://api.useapi.net/v1/minimax/audio/create-stream
Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
- API tokenis required, see Setup useapi.net for details.
Request Body
{
    "account": "Optional MiniMax www.minimax.io/audio API account",
    "text": "Required text",
    "voice_id": "Required voice id"
}
-  accountis optional when only one account configured. However, if you have multiple accounts configured, this parameter becomes required.
-  textis required. Insert<#0.5#>to add a 0.5s pause between sentences. Adjust the duration as needed.
 Maximum length: 5000 characters.
-  voice_idis required. Use GET audio/voices to get list of all available voices.
-  modelis optional.
 Supported values:speech-02-hd(default),speech-01-hd,speech-02-turbo,speech-01-turbo.
-  language_boostis optional. Use tag_name from arrayvoice_tag_languageof GET audio/config.
 Default valueAuto.
-  emotionis optional. Use value from arrayt2a_emotionof GET audio/config.
 Default valueAuto.
-  volis optional.
 Default 1.
-  speedis optional.
 Valid range: 0.5β¦2, default 1.
-  pitchis optional.
 Valid range: -12β¦12, default 0.
-  deepen_lightenis optional.
 Valid range: -100β¦100, default 0.
-  stronger_softeris optional.
 Valid range: -100β¦100, default 0.
-  nasal_crispis optional.
 Valid range: -100β¦100, default 0.
-  spacious_echois optional.
 Supported values:true,false(default).
-  lofi_telephoneis optional.
 Supported values:true,false(default).
-  roboticis optional.
 Supported values:true,false(default).
-  auditorium_echois optional.
 Supported values:true,false(default).
Responses
-   Field tokenandpayloadvalues are used by WebSocket WSS audio/wss.- The tokencontains WebSocket authorization information and will expire in 24 hours.
- The payloadcontains a properly formed and validated payload built from user-provided input. You should send this payload over the WebSocket to generate a real-time sound stream and optionally retrieve the generated MP3 file.
 { "token": "token for WSS WebSocket endpoint", "payload": { "msg_id": "1e53d-a593-e40b9-54e20-f29c84-40ae3", "payload": { "model": "", "text": "Text for TTS generation", "voice_setting": { "speed": 1, "vol": 1, "pitch": 0, "voice_id": "123456789", "emotion": "happy" }, "audio_setting": {}, "effects": { "deepen_lighten": 0, "stronger_softer": 0, "nasal_crisp": 0, "spacious_echo": false, "lofi_telephone": false, "robotic": false, "auditorium_echo": false }, "er_weights": [], "language_boost": "German", "stream": true } } }
- The 
-   { "error": "<Error message>" }
-   { "error": "Unauthorized" }
Examples
The code provided at WSS audio/wss is used on this page when you use the Try It feature.
Model
Field token and payload values are used by WebSocket WSS audio/wss.
- The tokencontains WebSocket authorization information and will expire in 24 hours.
- The payloadcontains a properly formed and validated payload built from user-provided input. You should send this payload over the WebSocket to generate a real-time sound stream and optionally retrieve the generated MP3 file.
{ // TypeScript, all fields are optional
  token: string
  payload: {}
}