Skip to main content

General Questions

What is VoxNexus?

VoxNexus is a comprehensive voice services platform that provides Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities through easy-to-use APIs. Our platform enables developers to add natural voice synthesis and accurate speech recognition to their applications.

What languages are supported?

VoxNexus supports 50+ languages including English, Chinese, Spanish, French, German, Japanese, Korean, and many more. We continuously add new languages and regional variants to our voice library.

How accurate is the speech recognition?

Our Speech-to-Text service achieves high accuracy rates, typically above 95% for clear audio in supported languages. Accuracy can vary based on audio quality, background noise, speaker accent, and language complexity.

How natural do the voices sound?

Our Text-to-Speech voices are generated using advanced AI models and sound very natural. Voice quality is comparable to human narration, with natural intonation, rhythm, and pronunciation.

Getting Started

How do I get an API key?

  1. Sign up for a free account at voxnexus.ai/dashboard
  2. Navigate to the API Keys section
  3. Create a new API key
  4. Copy and securely store your API key

Is there a free tier?

Yes, we offer a free tier with limited usage. Check our pricing page for details on free tier limits and paid plans.

How quickly can I get started?

You can get started in minutes! Simply:
  1. Sign up for an account
  2. Get your API key
  3. Make your first API call
  4. See our Quick Start Guide for detailed steps

API Usage

What’s the difference between REST API and WebSocket API?

  • REST API: Best for standard request-response scenarios. Supports both synchronous and streaming responses. Use for batch processing, file uploads, and standard integrations.
  • WebSocket API: Ideal for real-time bidirectional communication. Lower latency, persistent connection. Use for live transcription, real-time voice synthesis, and interactive applications.

How do I handle audio files?

For Speech-to-Text, you can send audio files directly in the request body. Supported formats include WAV, MP3, PCM, and OGG. For Text-to-Speech, audio is returned in the response body in your requested format.

What audio formats are supported?

Text-to-Speech output formats:
  • MP3 (compressed, web-friendly)
  • WAV (uncompressed, high quality)
  • OGG (open-source compressed)
  • PCM (raw audio data)
  • WebM (web-optimized)
Speech-to-Text input formats:
  • WAV
  • MP3
  • PCM
  • OGG
  • application/octet-stream

How do I specify the language?

For Text-to-Speech, specify the language using the language parameter (ISO 639-1 format):
{
  "text": "Hello",
  "voice_id": "vl-jenny",
  "language": "en-US"
}
For Speech-to-Text, specify via query parameter:
curl -X POST "https://api.voxnexus.ai/v1/stt?language=en-US&sample_rate=16000" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav
If not specified, the service will attempt to auto-detect the language.

Voice Selection

How do I choose the right voice?

Consider these factors:
  1. Language: Choose a voice that matches your content language
  2. Gender: Select based on your application’s needs
  3. Age: Match voice age to your target audience
  4. Style: Choose professional, casual, cheerful, etc.
  5. Test: Always test voices with sample content
Use our Voice Library to browse and filter voices.

Can I customize voices?

Yes! You can customize:
  • Speed: Adjust speech rate (0.5x to 2.0x)
  • Pitch: Modify pitch in semitones (-12 to +12)
  • Volume: Set volume multiplier (0.0 to 1.0)
  • Voice Config: Some voices support style, role, and other parameters

Do voices support SSML?

Yes! Enable SSML support by setting ssml: true:
{
  "text": "<speak>Hello <break time='500ms'/> world</speak>",
  "voice_id": "vl-xiaoxiao",
  "ssml": true
}

Speech Recognition

What is speaker diarization?

Speaker diarization identifies different speakers in multi-speaker audio. Enable it with:
curl -X POST "https://api.voxnexus.ai/v1/stt?enable_speaker_diarization=true" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @meeting.wav

How do I get word-level timestamps?

Enable timestamps in your request:
curl -X POST "https://api.voxnexus.ai/v1/stt?enable_timestamps=true" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Can I improve recognition accuracy?

Yes, several ways:
  1. Specify language: Always specify the language when known
  2. Use high-quality audio: Better audio quality = better accuracy
  3. Add keywords: Use the keywords parameter for important terms
  4. Custom vocabulary: Add domain-specific terms
  5. Enable confidence scores: Identify uncertain segments

What sample rates are supported?

Supported sample rates: 8000, 16000, 22050, 24000, 44100, 48000 Hz.
  • 8kHz: Telephony quality
  • 16kHz: Standard quality (recommended for most use cases)
  • 44.1kHz/48kHz: High-quality audio

Pricing & Limits

How is pricing calculated?

Pricing is based on usage:
  • Text-to-Speech: Charged per character or audio duration
  • Speech-to-Text: Charged per audio minute processed
  • Rate Limits: Based on your plan tier
Check response headers for usage information:
  • X-Quota-Used: Credits consumed
  • X-RateLimit-Remaining: Remaining requests

What happens if I exceed my quota?

When you exceed your quota, API requests will return a 429 status code. Upgrade your plan or wait for quota reset to continue.

Can I monitor my usage?

Yes! Usage information is included in response headers:
  • X-RateLimit-Remaining: Remaining requests
  • X-Quota-Used: Credits consumed
You can also check usage in your Dashboard.

Technical Questions

How do I handle errors?

All errors follow a consistent format:
{
  "error": "Error description",
  "code": "ERROR_CODE",
  "details": "Additional details",
  "request_id": "req_1234567890"
}
Common HTTP status codes:
  • 400: Bad Request (invalid parameters)
  • 401: Unauthorized (invalid API key)
  • 429: Rate Limit Exceeded
  • 500: Server Error

How do I implement retry logic?

Implement exponential backoff for retries:
async function makeRequestWithRetry(url, options, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch(url, options);
      if (response.ok) return response;
      
      if (response.status === 429) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      
      throw new Error(`HTTP ${response.status}`);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

How do I stream audio responses?

Text-to-Speech responses use chunked transfer encoding by default. Handle streaming:
const response = await fetch('https://api.voxnexus.ai/v1/tts', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Long text here...',
    voice_id: 'vl-xiaoxiao'
  })
});

const reader = response.body.getReader();
const chunks = [];

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  chunks.push(value);
}

// Combine chunks and play
const audioBlob = new Blob(chunks, { type: 'audio/mpeg' });

How do I handle WebSocket reconnections?

Implement reconnection logic:
class ReconnectingWebSocket {
  constructor(url, options) {
    this.url = url;
    this.options = options;
    this.reconnectAttempts = 0;
    this.maxAttempts = 5;
    this.connect();
  }
  
  connect() {
    this.ws = new WebSocket(this.url, this.options);
    
    this.ws.onclose = () => {
      if (this.reconnectAttempts < this.maxAttempts) {
        setTimeout(() => {
          this.reconnectAttempts++;
          this.connect();
        }, Math.pow(2, this.reconnectAttempts) * 1000);
      }
    };
    
    this.ws.onopen = () => {
      this.reconnectAttempts = 0;
    };
  }
}

Security & Privacy

Is my data secure?

Yes! We take security seriously:
  • Encryption: All data encrypted in transit (TLS/SSL)
  • No Storage: Audio data is not stored after processing
  • API Keys: Secure key management and rotation
  • GDPR Compliant: Meets data protection regulations

Do you store my audio files?

No. Audio files are processed and immediately discarded. We do not store your audio data.

How do I secure my API key?

Best practices:
  1. Never commit keys to version control
  2. Use environment variables
  3. Rotate keys regularly
  4. Use different keys for different environments
  5. Monitor key usage

Support

Where can I get help?

  • Documentation: Browse our comprehensive docs
  • Email Support: [email protected]
  • Dashboard: Manage your account and view usage
  • Status Page: Check service status

How do I report bugs?

Report bugs via email to [email protected]. Include:
  • API endpoint used
  • Request details (without sensitive data)
  • Error response
  • Steps to reproduce

Next Steps