Text to Speech

curl --request POST \
  --url https://api.voxnexus.ai/v1/tts \
  --header 'Content-Type: application/json' \
  --header 'X-Api-Key: <api-key>' \
  --data '
{
  "text": "Hello, this is a test message.",
  "voice_id": "vl-xiaoxiao",
  "language": "zh-CN",
  "format": "wav",
  "sample_rate": 16000,
  "bit_rate": 128,
  "speed": 1,
  "pitch": 0,
  "volume": 1,
  "voice_config": {
    "style": "cheerful",
    "role": "Girl",
    "degree": 0.5
  }
}
'

"<string>"

POST

tts

Text to Speech

curl --request POST \
  --url https://api.voxnexus.ai/v1/tts \
  --header 'Content-Type: application/json' \
  --header 'X-Api-Key: <api-key>' \
  --data '
{
  "text": "Hello, this is a test message.",
  "voice_id": "vl-xiaoxiao",
  "language": "zh-CN",
  "format": "wav",
  "sample_rate": 16000,
  "bit_rate": 128,
  "speed": 1,
  "pitch": 0,
  "volume": 1,
  "voice_config": {
    "style": "cheerful",
    "role": "Girl",
    "degree": 0.5
  }
}
'

"<string>"

Authorizations

X-Api-Key

string

header

required

Authenticate using X-Api-Key header

Body

application/json

text

string

required

Text to convert (required)

Example:

"Hello, this is a test message."

voice_id

string

required

Voice unique identifier (required)

Example:

"vl-xiaoxiao"

language

string

Language or locale code (optional). Supports both ISO 639-1 language codes (e.g. "en", "zh") and BCP 47 locale codes (e.g. "en-US", "zh-CN"). When a language code is provided, the system will automatically resolve it to the most common locale (e.g. "en" -> "en-US").

Example:

"zh-CN"

format

enum<string>

default:wav

Audio format (optional, default: wav)

Available options:

wav,

pcm

sample_rate

enum<integer>

default:16000

Sample rate (optional, default: 16000)

Available options:

16000,

24000,

48000

bit_rate

integer

default:128

deprecated

Bit rate (kbps), only valid for compressed formats (NOT SUPPORTED YET)

Example:

128

speed

number<float>

default:1

Speech rate multiplier, range: 0.5 - 2.0, default: 1.0 (optional)

Required range: 0.5 <= x <= 2

pitch

integer

default:0

Pitch offset (semitones), range: -12 - 12, default: 0 (optional)

Required range: -12 <= x <= 12

volume

number<float>

default:1

Volume multiplier, range: 0.0 - 1.0, default: 1.0 (optional)

Required range: 0 <= x <= 1

voice_config

object

Voice-specific configuration (optional, use according to configuration items supported by voice_id)

Example:

{
  "style": "cheerful",
  "role": "Girl",
  "degree": 0.5
}

Response

Successfully returns audio stream

The response is of type file.

Speech-to-Text Guide Speech to Text

Getting Started

REST API

WebSocket API

Text to Speech

Authorizations

Body

Response