Skip to main content

Overview

Stream cloned voice audio in real-time with the lowest latency via WebSocket. Pass the speaker_embedding from Voice Clone Embeddings to use your cloned voice. For simpler use cases, see Voice Cloning REST or Voice Cloning Streaming.

Endpoint

wss://api.vachana.ai/api/v1/tts

Authentication

All Realtime connections require the following headers:
HeaderRequiredDescriptionExample
Content-TypeYesMust be application/jsonapplication/json
X-API-Key-IDYesYour API key for authentication<your-api-key-id>

Request Format

Send a JSON message with the following structure:
{
  "text": "नमस्ते, आप कैसे हैं?",
  "model": "vachana-vc-v1",
  "audio_config": {
    "sample_rate": 44100,
    "encoding": "linear_pcm"
  },
  "speaker_embedding": {
    "embedding": "<your-embedding-string>",
    "shape": [1, 768],
    "dtype": "torch.bfloat16"
  }
}

Response

The server streams audio data in real-time as binary chunks. Each chunk contains PCM audio data according to the specified audio_config.

Example Usage

const ws = new WebSocket("wss://api.vachana.ai/api/v1/tts", {
  headers: {
    "Content-Type": "application/json",
    "X-API-Key-ID": "<your-api-key>",
  },
});

ws.on("open", () => {
  const request = {
    text: "नमस्ते, आप कैसे हैं?",
    model: "vachana-vc-v1",
    audio_config: {
      sample_rate: 44100,
      encoding: "linear_pcm",
    },
    speaker_embedding: {
      embedding: "<your-embedding-string>",
      shape: [1, 768],
      dtype: "torch.bfloat16",
    },
  };

  ws.send(JSON.stringify(request));
});

ws.on("message", (data) => {
  // Handle audio chunks
  console.log("Received audio chunk:", data);
});

ws.on("error", (error) => {
  console.error("WebSocket error:", error);
});

ws.on("close", () => {
  console.log("WebSocket connection closed");
});