Skip to main content

Overview

The Text-to-Speech (TTS) API converts text into natural-sounding speech audio. VoxNexus supports both REST API and WebSocket API for TTS operations.

REST API

The REST API endpoint /v1/tts supports synchronous and streaming audio generation.

Basic Usage

curl -X POST https://api.voxnexus.ai/v1/tts \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test message",
    "voice_id": "vl-xiaoxiao",
    "format": "wav",
    "sample_rate": 16000
  }'

Request Parameters

text
string
required
The text content to convert to speech.
voice_id
string
required
Unique identifier of the voice to use. Use the /v1/voices endpoint to browse available voices.
language
string
Language or locale code. Supports both ISO 639-1 language codes (e.g., en, zh) and BCP 47 locale codes (e.g., en-US, zh-CN). When a language code is provided, the system will automatically resolve it to the most common locale (e.g., enen-US). Optional, but recommended for better accuracy.
format
string
Audio format. Supported values: wav, pcm. Default: wav.
sample_rate
integer
Sample rate in Hz. Supported values: 16000, 24000, 48000. Default: 16000.
bit_rate
integer
deprecated
Bit rate in kbps. Not supported yet - reserved for future compressed format support. Default: 128.
speed
number
Speech rate multiplier. Range: 0.5 - 2.0. Default: 1.0.
pitch
integer
Pitch offset in semitones. Range: -12 - 12. Default: 0.
volume
number
Volume multiplier. Range: 0.0 - 1.0. Default: 1.0.
voice_config
object
Voice-specific configuration object. Properties depend on the selected voice. Check voice details using /v1/voices/{voice_id} endpoint.

Response

The API returns audio data in the requested format. Response headers include metadata:
  • X-Request-ID: Unique request identifier
  • X-Voice-ID: Voice ID used for synthesis
  • X-Language: Language code
  • X-Audio-Format: Audio format
  • X-Sample-Rate: Sample rate
  • X-Duration-Ms: Audio duration in milliseconds
  • X-Created-At: Creation timestamp
  • Transfer-Encoding: Transfer encoding (defaults to chunked streaming)
HTTP/1.1 200 OK
X-Request-ID: req_1234567890
X-Voice-ID: vl-xiaoxiao
X-Language: zh-CN
X-Audio-Format: wav
X-Sample-Rate: 16000
X-Duration-Ms: 2500
X-Created-At: 2024-01-01T12:00:00Z
Transfer-Encoding: chunked
Content-Type: audio/wav

[Audio binary data]

Streaming Response

By default, the API uses chunked transfer encoding for streaming audio data. This allows you to start playing audio while it’s still being generated, reducing latency.

WebSocket API

The WebSocket API provides real-time bidirectional communication for TTS operations, ideal for interactive applications.

Connection

Connect to wss://api.voxnexus.ai/v1/tts/realtime with authentication:
// Connect with token as query parameter (recommended)
const ws = new WebSocket('wss://api.voxnexus.ai/v1/tts/realtime?token=YOUR_API_KEY');

Message Flow

  1. Initialize: Send an init message to configure voice parameters
  2. Send Text: Send text messages with content to synthesize
  3. Receive Audio: Receive audio messages with Base64-encoded audio data
  4. Handle Errors: Monitor for error messages

Initialization Message

{
  "type": "init",
  "voice_id": "vl-xiaoxiao",
  "language": "zh-CN",
  "format": "wav",
  "sample_rate": 16000,
  "speed": 1.0,
  "pitch": 0,
  "volume": 1.0,
  "voice_config": {
    "style": "cheerful",
    "role": "Girl",
    "degree": 0.5
  }
}

Text Message

{
  "type": "text",
  "text": "Hello, this is a test",
  "is_final": false
}

Audio Response

{
  "type": "audio",
  "data": "base64-encoded-audio-data",
  "is_final": false
}

Complete Example

// Connect with token as query parameter
const ws = new WebSocket('wss://api.voxnexus.ai/v1/tts/realtime?token=YOUR_API_KEY');

ws.onopen = () => {
  // Initialize
  ws.send(JSON.stringify({
    type: 'init',
    voice_id: 'vl-xiaoxiao',
    format: 'wav',
    sample_rate: 16000
  }));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  switch (message.type) {
    case 'ready':
      console.log('Ready:', message.request_id);
      // Send text to synthesize
      ws.send(JSON.stringify({
        type: 'text',
        text: 'Hello, this is a test',
        is_final: true
      }));
      break;
      
    case 'audio':
      // Decode and play audio
      const audioData = atob(message.data);
      // Handle audio playback
      break;
      
    case 'error':
      console.error('Error:', message.error);
      break;
  }
};

Best Practices

Voice Selection

  • Use the /v1/voices endpoint to browse available voices
  • Filter voices by language, gender, age, or style
  • Test voices using sample audio URLs before production use

Performance Optimization

  • Use streaming for long texts to reduce perceived latency
  • Choose appropriate sample rates (16kHz is sufficient for most use cases)
  • Use PCM format for real-time WebSocket streaming, WAV for REST API

Error Handling

Always implement proper error handling:
try {
  const response = await fetch('https://api.voxnexus.ai/v1/tts', {
    method: 'POST',
    headers: {
      'X-Api-Key': 'YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: 'Hello',
      voice_id: 'vl-xiaoxiao'
    })
  });
  
  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.error || 'Request failed');
  }
  
  // Handle audio data
} catch (error) {
  console.error('TTS Error:', error);
}

Rate Limits and Quotas

  • Implement exponential backoff for 429 responses
  • Consider using WebSocket API for high-frequency use cases

Common Use Cases

Interactive Voice Response (IVR)

Use WebSocket API for real-time synthesis in IVR systems:
// Synthesize prompts in real-time
ws.send(JSON.stringify({
  type: 'text',
  text: 'Please press 1 for sales',
  is_final: false
}));

Content Narration

Use REST API for batch processing of long-form content:
# Process entire articles or books
curl -X POST https://api.voxnexus.ai/v1/tts \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d @article.json \
  --output narration.wav

Accessibility Features

Generate audio versions of text content for accessibility:
async function generateAudioAccessibility(text) {
  const response = await fetch('https://api.voxnexus.ai/v1/tts', {
    method: 'POST',
    headers: {
      'X-Api-Key': 'YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: text,
      voice_id: 'vl-xiaoxiao',
      format: 'wav',
      sample_rate: 24000 // Higher quality for better clarity
    })
  });
  
  return response.blob();
}