Text-to-Speech (Realtime)

Currently in beta. You’re on the priority waitlist and among the first to get access.

Overview

Stream audio in real-time with the lowest latency. Perfect for interactive assistants and live applications. For simpler use cases, see TTS REST or TTS SSE.

Passing numbers, IDs, dates, or currency as raw strings causes mispronunciations. See the Input Formatting Guide for correct formatting of phone numbers, account numbers, PINs, Aadhaar, vehicle registration numbers, GSTIN, currency, and more.

Endpoint

wss://api.vachana.ai/api/v1/tts

Authentication

All Realtime connections require the following headers:

Header	Required	Description	Example
`Content-Type`	Yes	Must be `application/json`	`application/json`
`X-API-Key-ID`	Yes	Your API key for authentication	`<your-api-key-id>`

Request Format

Send a JSON message with the following structure:

{
  "text": "नमस्ते, आप कैसे हैं?",
  "model": "vachana-voice-v2",
  "audio_config": {
    "sample_rate": 44100,
    "encoding": "linear_pcm"
  }
}

num_channels

integer

required

Number of audio channels (e.g., 1 for mono, 2 for stereo)

sample_width

integer

required

Sample width in bytes (e.g., 2 for 16-bit audio)

encoding

string

required

Audio encoding format (e.g., linear_pcm)

container

string

required

Audio container format (e.g., wav)

Response

The server streams audio data in real-time as binary chunks. Each chunk contains PCM audio data according to the specified audio_config.

Example Usage

const ws = new WebSocket("wss://api.vachana.ai/api/v1/tts", {
  headers: {
    "Content-Type": "application/json",
    "X-API-Key-ID": "<your-api-key>",
  },
});

ws.on("open", () => {
  const request = {
    text: "नमस्ते, आप कैसे हैं?",
    model: "vachana-voice-v2",
    audio_config: {
      sample_rate: 44100,
      encoding: "linear_pcm",
    },
  };

  ws.send(JSON.stringify(request));
});

ws.on("message", (data) => {
  // Handle audio chunks
  console.log("Received audio chunk:", data);
});

ws.on("error", (error) => {
  console.error("WebSocket error:", error);
});

ws.on("close", () => {
  console.log("WebSocket connection closed");
});

Python SDK

The SDK’s realtime client manages the WebSocket lifecycle, audio streaming, and async iteration so you can focus on your application logic.

Installation

pip install gnani-vachana

Requires Python 3.9+.

Authentication

from gnani.tts import GnaniTTSRealtimeClient

client = GnaniTTSRealtimeClient(api_key="your-api-key")

Stream Audio Chunks in Real-Time

Use the async context manager to open the connection and iterate over audio chunks as they arrive.

import asyncio
from gnani.tts import GnaniTTSRealtimeClient

async def main():
    async with GnaniTTSRealtimeClient(api_key="your-api-key") as client:
        with open("output.wav", "wb") as f:
            async for chunk in client.synthesize(
                "नमस्ते, आप कैसे हैं?",
                voice="sia",
            ):
                f.write(chunk)

asyncio.run(main())

Collect All Audio at Once

If you don’t need to process chunks as they arrive, use synthesize_and_collect to get the full audio as a single bytes object.

import asyncio
from gnani.tts import GnaniTTSRealtimeClient

async def main():
    async with GnaniTTSRealtimeClient(api_key="your-api-key") as client:
        audio = await client.synthesize_and_collect(
            "Realtime TTS response",
            voice="neha",
        )
        with open("output.wav", "wb") as f:
            f.write(audio)

asyncio.run(main())

Vachana

Speech-to-Text

Text-to-Speech

Voice Cloning

Text-to-Speech (Realtime)

Overview

Endpoint

Authentication

Request Format

Response

Example Usage

Python SDK

Installation

Authentication

Stream Audio Chunks in Real-Time

Collect All Audio at Once

Vachana

Speech-to-Text

Text-to-Speech

Voice Cloning

Documentation Index

​Overview

​Endpoint

​Authentication

​Request Format

​Response

​Example Usage

​Python SDK

​Installation

​Authentication

​Stream Audio Chunks in Real-Time

​Collect All Audio at Once

Overview

Endpoint

Authentication

Request Format

Response

Example Usage

Python SDK

Installation

Authentication

Stream Audio Chunks in Real-Time

Collect All Audio at Once