Overview
The WebSocket endpoint provides real-time text-to-speech conversion with streaming audio output. This is ideal for applications requiring low-latency audio generation (e.g. interactive assistants). For one-shot or chunked HTTP streaming, see TTS REST or TTS SSE.Endpoint
Authentication
All WebSocket connections require the following headers:| Header | Required | Description | Example |
|---|---|---|---|
Content-Type | Yes | Must be application/json | application/json |
X-API-Key-ID | Yes | Your API key for authentication | <your-api-key-id> |
Request Format
Send a JSON message with the following structure:Number of audio channels (e.g.,
1 for mono, 2 for stereo)Sample width in bytes (e.g.,
2 for 16-bit audio)Audio encoding format (e.g.,
linear_pcm)Audio container format (e.g.,
wav)Response
The server streams audio data in real-time as binary chunks. Each chunk contains PCM audio data according to the specifiedaudio_config.