FAQ - VoxNexus

General Questions

What is VoxNexus?

VoxNexus is a comprehensive voice services platform that provides Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities through easy-to-use APIs. Our platform enables developers to add natural voice synthesis and accurate speech recognition to their applications.

What languages are supported?

VoxNexus supports 50+ languages including English, Chinese, Spanish, French, German, Japanese, Korean, and many more. We continuously add new languages and regional variants to our voice library.

How accurate is the speech recognition?

Our Speech-to-Text service achieves high accuracy rates, typically above 95% for clear audio in supported languages. Accuracy can vary based on audio quality, background noise, speaker accent, and language complexity.

How natural do the voices sound?

Our Text-to-Speech voices are generated using advanced AI models and sound very natural. Voice quality is comparable to human narration, with natural intonation, rhythm, and pronunciation.

Getting Started

How do I get an API key?

Sign up for a free account at voxnexus.ai/dashboard
Navigate to the API Keys section
Create a new API key
Copy and securely store your API key

Is there a free tier?

Yes, we offer a free tier with limited usage. Check our pricing page for details on free tier limits and paid plans.

How quickly can I get started?

You can get started in minutes! Simply:

Sign up for an account
Get your API key
Make your first API call
See our Quick Start Guide for detailed steps

API Usage

What’s the difference between REST API and WebSocket API?

REST API: Best for standard request-response scenarios. Supports both synchronous and streaming responses. Use for batch processing, file uploads, and standard integrations.
WebSocket API: Ideal for real-time bidirectional communication. Lower latency, persistent connection. Use for live transcription, real-time voice synthesis, and interactive applications.

How do I handle audio files?

For Speech-to-Text, you can send audio files directly in the request body. Supported formats include WAV, MP3, PCM, and OGG. For Text-to-Speech, audio is returned in the response body in your requested format.

What audio formats are supported?

Text-to-Speech output formats:

MP3 (compressed, web-friendly)
WAV (uncompressed, high quality)
OGG (open-source compressed)
PCM (raw audio data)
WebM (web-optimized)

Speech-to-Text input formats:

WAV
MP3
PCM
OGG
application/octet-stream

How do I specify the language?

For Text-to-Speech, specify the language using the language parameter (ISO 639-1 format):

{
  "text": "Hello",
  "voice_id": "vl-jenny",
  "language": "en-US"
}

For Speech-to-Text, specify via query parameter:

curl -X POST "https://api.voxnexus.ai/v1/stt?language=en-US&sample_rate=16000" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

If not specified, the service will attempt to auto-detect the language.

Voice Selection

How do I choose the right voice?

Consider these factors:

Language: Choose a voice that matches your content language
Gender: Select based on your application’s needs
Age: Match voice age to your target audience
Style: Choose professional, casual, cheerful, etc.
Test: Always test voices with sample content

Use our Voice Library to browse and filter voices.

Can I customize voices?

Yes! You can customize:

Speed: Adjust speech rate (0.5x to 2.0x)
Pitch: Modify pitch in semitones (-12 to +12)
Volume: Set volume multiplier (0.0 to 1.0)
Voice Config: Some voices support style, role, and other parameters

Do voices support SSML?

Yes! Enable SSML support by setting ssml: true:

{
  "text": "<speak>Hello <break time='500ms'/> world</speak>",
  "voice_id": "vl-xiaoxiao",
  "ssml": true
}

Speech Recognition

What is speaker diarization?

Speaker diarization identifies different speakers in multi-speaker audio. Enable it with:

curl -X POST "https://api.voxnexus.ai/v1/stt?enable_speaker_diarization=true" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @meeting.wav

How do I get word-level timestamps?

Enable timestamps in your request:

curl -X POST "https://api.voxnexus.ai/v1/stt?enable_timestamps=true" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Can I improve recognition accuracy?

Yes, several ways:

Specify language: Always specify the language when known
Use high-quality audio: Better audio quality = better accuracy
Add keywords: Use the keywords parameter for important terms
Custom vocabulary: Add domain-specific terms
Enable confidence scores: Identify uncertain segments

What sample rates are supported?

Supported sample rates: 8000, 16000, 22050, 24000, 44100, 48000 Hz.

8kHz: Telephony quality
16kHz: Standard quality (recommended for most use cases)
44.1kHz/48kHz: High-quality audio

Pricing & Limits

How is pricing calculated?

Pricing is based on usage:

Text-to-Speech: Charged per character or audio duration
Speech-to-Text: Charged per audio minute processed
Rate Limits: Based on your plan tier

Check response headers for usage information:

X-Quota-Used: Credits consumed
X-RateLimit-Remaining: Remaining requests

What happens if I exceed my quota?

When you exceed your quota, API requests will return a 429 status code. Upgrade your plan or wait for quota reset to continue.

Can I monitor my usage?

Yes! Usage information is included in response headers:

X-RateLimit-Remaining: Remaining requests
X-Quota-Used: Credits consumed

You can also check usage in your Dashboard.

Technical Questions

How do I handle errors?

All errors follow a consistent format:

{
  "error": "Error description",
  "code": "ERROR_CODE",
  "details": "Additional details",
  "request_id": "req_1234567890"
}

Common HTTP status codes:

400: Bad Request (invalid parameters)
401: Unauthorized (invalid API key)
429: Rate Limit Exceeded
500: Server Error

How do I implement retry logic?

Implement exponential backoff for retries:

async function makeRequestWithRetry(url, options, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch(url, options);
      if (response.ok) return response;
      
      if (response.status === 429) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      
      throw new Error(`HTTP ${response.status}`);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

How do I stream audio responses?

Text-to-Speech responses use chunked transfer encoding by default. Handle streaming:

const response = await fetch('https://api.voxnexus.ai/v1/tts', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Long text here...',
    voice_id: 'vl-xiaoxiao'
  })
});

const reader = response.body.getReader();
const chunks = [];

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  chunks.push(value);
}

// Combine chunks and play
const audioBlob = new Blob(chunks, { type: 'audio/mpeg' });

How do I handle WebSocket reconnections?

Implement reconnection logic:

class ReconnectingWebSocket {
  constructor(url, options) {
    this.url = url;
    this.options = options;
    this.reconnectAttempts = 0;
    this.maxAttempts = 5;
    this.connect();
  }
  
  connect() {
    this.ws = new WebSocket(this.url, this.options);
    
    this.ws.onclose = () => {
      if (this.reconnectAttempts < this.maxAttempts) {
        setTimeout(() => {
          this.reconnectAttempts++;
          this.connect();
        }, Math.pow(2, this.reconnectAttempts) * 1000);
      }
    };
    
    this.ws.onopen = () => {
      this.reconnectAttempts = 0;
    };
  }
}

Security & Privacy

Is my data secure?

Yes! We take security seriously:

Encryption: All data encrypted in transit (TLS/SSL)
No Storage: Audio data is not stored after processing
API Keys: Secure key management and rotation
GDPR Compliant: Meets data protection regulations

Do you store my audio files?

No. Audio files are processed and immediately discarded. We do not store your audio data.

How do I secure my API key?

Best practices:

Never commit keys to version control
Use environment variables
Rotate keys regularly
Use different keys for different environments
Monitor key usage

Support

Where can I get help?

Documentation: Browse our comprehensive docs
Email Support: support@voxnexus.ai
Dashboard: Manage your account and view usage
Status Page: Check service status

How do I report bugs?

Report bugs via email to support@voxnexus.ai. Include:

API endpoint used
Request details (without sensitive data)
Error response
Steps to reproduce

Next Steps

Quick Start

Get started with VoxNexus

API Reference

Explore API documentation

Features

Learn about platform features

Use Cases

See real-world applications

Getting started

Platform

​General Questions

​What is VoxNexus?

​What languages are supported?

​How accurate is the speech recognition?

​How natural do the voices sound?

​Getting Started

​How do I get an API key?

​Is there a free tier?

​How quickly can I get started?

​API Usage

​What’s the difference between REST API and WebSocket API?

​How do I handle audio files?

​What audio formats are supported?

​How do I specify the language?

​Voice Selection

​How do I choose the right voice?

​Can I customize voices?

​Do voices support SSML?

​Speech Recognition

​What is speaker diarization?

​How do I get word-level timestamps?

​Can I improve recognition accuracy?

​What sample rates are supported?

​Pricing & Limits

​How is pricing calculated?

​What happens if I exceed my quota?

​Can I monitor my usage?

​Technical Questions

​How do I handle errors?

​How do I implement retry logic?

​How do I stream audio responses?

​How do I handle WebSocket reconnections?

​Security & Privacy

​Is my data secure?

​Do you store my audio files?

​How do I secure my API key?

​Support

​Where can I get help?

​How do I report bugs?

​Next Steps