Convert text to speech audio, returns audio data in real-time using chunked streaming by default
Authenticate using Bearer Token, where the token is an API Key
Text to convert (required)
"Hello, this is a test message."
Voice unique identifier (required)
"vl-xiaoxiao"
Language code, ISO 639-1 (optional)
"zh-CN"
Audio format (optional, default: mp3)
mp3, wav, ogg, pcm, webm Sample rate (optional, default: 16000)
8000, 16000, 22050, 24000, 44100, 48000 Bit rate (kbps), only valid for compressed formats (optional, default: 128)
128
Speech rate multiplier, range: 0.5 - 2.0, default: 1.0 (optional)
0.5 <= x <= 2Pitch offset (semitones), range: -12 - 12, default: 0 (optional)
-12 <= x <= 12Volume multiplier, range: 0.0 - 1.0, default: 1.0 (optional)
0 <= x <= 1Whether to use SSML format for text (optional, default: false)
Voice-specific configuration (optional, use according to configuration items supported by voice_id)
{
"style": "cheerful",
"role": "Girl",
"degree": 0.5
}Successfully returns audio stream
The response is of type file.