General Questions
What is VoxNexus?
VoxNexus is a comprehensive voice services platform that provides Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities through easy-to-use APIs. Our platform enables developers to add natural voice synthesis and accurate speech recognition to their applications.What languages are supported?
VoxNexus supports 50+ languages including English, Chinese, Spanish, French, German, Japanese, Korean, and many more. We continuously add new languages and regional variants to our voice library.How accurate is the speech recognition?
Our Speech-to-Text service achieves high accuracy rates, typically above 95% for clear audio in supported languages. Accuracy can vary based on audio quality, background noise, speaker accent, and language complexity.How natural do the voices sound?
Our Text-to-Speech voices are generated using advanced AI models and sound very natural. Voice quality is comparable to human narration, with natural intonation, rhythm, and pronunciation.Getting Started
How do I get an API key?
- Sign up for a free account at voxnexus.ai/dashboard
- Navigate to the API Keys section
- Create a new API key
- Copy and securely store your API key
Is there a free tier?
Yes, we offer a free tier with limited usage. Check our pricing page for details on free tier limits and paid plans.How quickly can I get started?
You can get started in minutes! Simply:- Sign up for an account
- Get your API key
- Make your first API call
- See our Quick Start Guide for detailed steps
API Usage
What’s the difference between REST API and WebSocket API?
- REST API: Best for standard request-response scenarios. Supports both synchronous and streaming responses. Use for batch processing, file uploads, and standard integrations.
- WebSocket API: Ideal for real-time bidirectional communication. Lower latency, persistent connection. Use for live transcription, real-time voice synthesis, and interactive applications.
How do I handle audio files?
For Speech-to-Text, you can send audio files directly in the request body. Supported formats include WAV, MP3, PCM, and OGG. For Text-to-Speech, audio is returned in the response body in your requested format.What audio formats are supported?
Text-to-Speech output formats:- MP3 (compressed, web-friendly)
- WAV (uncompressed, high quality)
- OGG (open-source compressed)
- PCM (raw audio data)
- WebM (web-optimized)
- WAV
- MP3
- PCM
- OGG
- application/octet-stream
How do I specify the language?
For Text-to-Speech, specify the language using thelanguage parameter (ISO 639-1 format):
Voice Selection
How do I choose the right voice?
Consider these factors:- Language: Choose a voice that matches your content language
- Gender: Select based on your application’s needs
- Age: Match voice age to your target audience
- Style: Choose professional, casual, cheerful, etc.
- Test: Always test voices with sample content
Can I customize voices?
Yes! You can customize:- Speed: Adjust speech rate (0.5x to 2.0x)
- Pitch: Modify pitch in semitones (-12 to +12)
- Volume: Set volume multiplier (0.0 to 1.0)
- Voice Config: Some voices support style, role, and other parameters
Do voices support SSML?
Yes! Enable SSML support by settingssml: true:
Speech Recognition
What is speaker diarization?
Speaker diarization identifies different speakers in multi-speaker audio. Enable it with:How do I get word-level timestamps?
Enable timestamps in your request:Can I improve recognition accuracy?
Yes, several ways:- Specify language: Always specify the language when known
- Use high-quality audio: Better audio quality = better accuracy
- Add keywords: Use the
keywordsparameter for important terms - Custom vocabulary: Add domain-specific terms
- Enable confidence scores: Identify uncertain segments
What sample rates are supported?
Supported sample rates: 8000, 16000, 22050, 24000, 44100, 48000 Hz.- 8kHz: Telephony quality
- 16kHz: Standard quality (recommended for most use cases)
- 44.1kHz/48kHz: High-quality audio
Pricing & Limits
How is pricing calculated?
Pricing is based on usage:- Text-to-Speech: Charged per character or audio duration
- Speech-to-Text: Charged per audio minute processed
- Rate Limits: Based on your plan tier
X-Quota-Used: Credits consumedX-RateLimit-Remaining: Remaining requests
What happens if I exceed my quota?
When you exceed your quota, API requests will return a429 status code. Upgrade your plan or wait for quota reset to continue.
Can I monitor my usage?
Yes! Usage information is included in response headers:X-RateLimit-Remaining: Remaining requestsX-Quota-Used: Credits consumed
Technical Questions
How do I handle errors?
All errors follow a consistent format:400: Bad Request (invalid parameters)401: Unauthorized (invalid API key)429: Rate Limit Exceeded500: Server Error
How do I implement retry logic?
Implement exponential backoff for retries:How do I stream audio responses?
Text-to-Speech responses use chunked transfer encoding by default. Handle streaming:How do I handle WebSocket reconnections?
Implement reconnection logic:Security & Privacy
Is my data secure?
Yes! We take security seriously:- Encryption: All data encrypted in transit (TLS/SSL)
- No Storage: Audio data is not stored after processing
- API Keys: Secure key management and rotation
- GDPR Compliant: Meets data protection regulations
Do you store my audio files?
No. Audio files are processed and immediately discarded. We do not store your audio data.How do I secure my API key?
Best practices:- Never commit keys to version control
- Use environment variables
- Rotate keys regularly
- Use different keys for different environments
- Monitor key usage
Support
Where can I get help?
- Documentation: Browse our comprehensive docs
- Email Support: [email protected]
- Dashboard: Manage your account and view usage
- Status Page: Check service status
How do I report bugs?
Report bugs via email to [email protected]. Include:- API endpoint used
- Request details (without sensitive data)
- Error response
- Steps to reproduce