Skip to main content

Overview

The Text-to-Speech (TTS) API converts text into natural-sounding speech audio. VoxNexus supports both REST API and WebSocket API for TTS operations.

REST API

The REST API endpoint /v1/tts supports synchronous and streaming audio generation.

Basic Usage

curl -X POST https://api.voxnexus.ai/v1/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test message",
    "voice_id": "vl-xiaoxiao",
    "format": "mp3",
    "sample_rate": 16000
  }'

Request Parameters

text
string
required
The text content to convert to speech. Supports plain text and SSML format when ssml is set to true.
voice_id
string
required
Unique identifier of the voice to use. Use the /v1/voices endpoint to browse available voices.
language
string
Language code in ISO 639-1 format (e.g., zh-CN, en-US). Optional, but recommended for better accuracy.
format
string
Audio format. Supported values: mp3, wav, ogg, pcm, webm. Default: mp3.
sample_rate
integer
Sample rate in Hz. Supported values: 8000, 16000, 22050, 24000, 44100, 48000. Default: 16000.
bit_rate
integer
Bit rate in kbps. Only valid for compressed formats (mp3, ogg). Default: 128.
speed
number
Speech rate multiplier. Range: 0.5 - 2.0. Default: 1.0.
pitch
integer
Pitch offset in semitones. Range: -12 - 12. Default: 0.
volume
number
Volume multiplier. Range: 0.0 - 1.0. Default: 1.0.
ssml
boolean
Whether to interpret text as SSML format. Default: false.
voice_config
object
Voice-specific configuration object. Properties depend on the selected voice. Check voice details using /v1/voices/{voice_id} endpoint.

Response

The API returns audio data in the requested format. Response headers include metadata:
  • X-Request-ID: Unique request identifier
  • X-Voice-ID: Voice ID used for synthesis
  • X-Language: Language code
  • X-Audio-Format: Audio format
  • X-Sample-Rate: Sample rate
  • X-Duration-Ms: Audio duration in milliseconds
  • X-Created-At: Creation timestamp
  • X-RateLimit-Remaining: Remaining requests
  • X-Quota-Used: Credits consumed
HTTP/1.1 200 OK
X-Request-ID: req_1234567890
X-Voice-ID: vl-xiaoxiao
X-Language: zh-CN
X-Audio-Format: mp3
X-Sample-Rate: 16000
X-Duration-Ms: 2500
X-Created-At: 2024-01-01T12:00:00Z
X-RateLimit-Remaining: 99
X-Quota-Used: 1
Transfer-Encoding: chunked
Content-Type: audio/mpeg

[Audio binary data]

Streaming Response

By default, the API uses chunked transfer encoding for streaming audio data. This allows you to start playing audio while it’s still being generated, reducing latency.

WebSocket API

The WebSocket API provides real-time bidirectional communication for TTS operations, ideal for interactive applications.

Connection

Connect to wss://api.voxnexus.ai/v1/tts/realtime with authentication header:
const ws = new WebSocket('wss://api.voxnexus.ai/v1/tts/realtime', {
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  }
});

Message Flow

  1. Initialize: Send an init message to configure voice parameters
  2. Send Text: Send text messages with content to synthesize
  3. Receive Audio: Receive audio messages with Base64-encoded audio data
  4. Handle Errors: Monitor for error messages

Initialization Message

{
  "type": "init",
  "voice_id": "vl-xiaoxiao",
  "language": "zh-CN",
  "format": "mp3",
  "sample_rate": 16000,
  "speed": 1.0,
  "pitch": 0,
  "volume": 1.0
}

Text Message

{
  "type": "text",
  "text": "Hello, this is a test",
  "is_final": false
}

Audio Response

{
  "type": "audio",
  "data": "base64-encoded-audio-data",
  "is_final": false
}

Complete Example

const ws = new WebSocket('wss://api.voxnexus.ai/v1/tts/realtime', {
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  }
});

ws.onopen = () => {
  // Initialize
  ws.send(JSON.stringify({
    type: 'init',
    voice_id: 'vl-xiaoxiao',
    format: 'mp3',
    sample_rate: 16000
  }));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  switch (message.type) {
    case 'ready':
      console.log('Ready:', message.request_id);
      // Send text to synthesize
      ws.send(JSON.stringify({
        type: 'text',
        text: 'Hello, this is a test',
        is_final: true
      }));
      break;
      
    case 'audio':
      // Decode and play audio
      const audioData = atob(message.data);
      // Handle audio playback
      break;
      
    case 'error':
      console.error('Error:', message.error);
      break;
  }
};

Best Practices

Voice Selection

  • Use the /v1/voices endpoint to browse available voices
  • Filter voices by language, gender, age, or style
  • Test voices using sample audio URLs before production use

Performance Optimization

  • Use streaming for long texts to reduce perceived latency
  • Choose appropriate sample rates (16kHz is sufficient for most use cases)
  • Use compressed formats (mp3) for network efficiency

Error Handling

Always implement proper error handling:
try {
  const response = await fetch('https://api.voxnexus.ai/v1/tts', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: 'Hello',
      voice_id: 'vl-xiaoxiao'
    })
  });
  
  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.error || 'Request failed');
  }
  
  // Handle audio data
} catch (error) {
  console.error('TTS Error:', error);
}

SSML Support

When using SSML format, set ssml: true and format your text accordingly:
{
  "text": "<speak>Hello <break time='500ms'/> world</speak>",
  "voice_id": "vl-xiaoxiao",
  "ssml": true
}

Rate Limits and Quotas

  • Monitor X-RateLimit-Remaining header to track remaining requests
  • Check X-Quota-Used to understand credit consumption
  • Implement exponential backoff for 429 responses
  • Consider using WebSocket API for high-frequency use cases

Common Use Cases

Interactive Voice Response (IVR)

Use WebSocket API for real-time synthesis in IVR systems:
// Synthesize prompts in real-time
ws.send(JSON.stringify({
  type: 'text',
  text: 'Please press 1 for sales',
  is_final: false
}));

Content Narration

Use REST API for batch processing of long-form content:
# Process entire articles or books
curl -X POST https://api.voxnexus.ai/v1/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d @article.json \
  --output narration.mp3

Accessibility Features

Generate audio versions of text content for accessibility:
async function generateAudioAccessibility(text) {
  const response = await fetch('https://api.voxnexus.ai/v1/tts', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: text,
      voice_id: 'vl-xiaoxiao',
      format: 'mp3',
      sample_rate: 22050 // Higher quality for better clarity
    })
  });
  
  return response.blob();
}