Convert text to speech audio, returns audio data in real-time using chunked streaming by default
Authenticate using X-Api-Key header
Text to convert (required)
"Hello, this is a test message."
Voice unique identifier (required)
"vl-xiaoxiao"
Language or locale code (optional). Supports both ISO 639-1 language codes (e.g. "en", "zh") and BCP 47 locale codes (e.g. "en-US", "zh-CN"). When a language code is provided, the system will automatically resolve it to the most common locale (e.g. "en" -> "en-US").
"zh-CN"
Audio format (optional, default: wav)
wav, pcm Sample rate (optional, default: 16000)
16000, 24000, 48000 Bit rate (kbps), only valid for compressed formats (NOT SUPPORTED YET)
128
Speech rate multiplier, range: 0.5 - 2.0, default: 1.0 (optional)
0.5 <= x <= 2Pitch offset (semitones), range: -12 - 12, default: 0 (optional)
-12 <= x <= 12Volume multiplier, range: 0.0 - 1.0, default: 1.0 (optional)
0 <= x <= 1Voice-specific configuration (optional, use according to configuration items supported by voice_id)
{
"style": "cheerful",
"role": "Girl",
"degree": 0.5
}Successfully returns audio stream
The response is of type file.