Skip to main content
POST
/
stt
/
v3
Speech to Text (REST)
curl --request POST \
  --url https://api.vachana.ai/stt/v3 \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key-ID: <api-key>' \
  --form audio_file='@example-file' \
  --form language_code=hi-IN
{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

Documentation Index

Fetch the complete documentation index at: https://docs.inya.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The REST endpoint transcribes audio files up to 60 seconds (Ideal duration is 30 seconds) in a single synchronous response. Ideal for batch processing or when you have pre-recorded audio. For real-time transcription, see STT Realtime.

Language Codes

The Vachana API supports these 10 Indian languages

LanguageCodeNative ScriptExample Text
Bengalibn-INBengali (বাংলা)“আমি ভাত খাই”
Englishen-INLatin”I am going to the market”
Gujaratigu-INGujarati (ગુજરાતી)“હું બજાર જાઉં છું”
Hindihi-INDevanagari (हिन्दी)“मैं बाज़ार जा रहा हूँ”
Kannadakn-INKannada (ಕನ್ನಡ)“ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ”
Malayalamml-INMalayalam (മലയാളം)“ഞാൻ ചന്തയിലേക്ക് പോകുന്നു”
Marathimr-INDevanagari (मराठी)“मी बाजारात जातोय”
Punjabipa-INGurmukhi (ਪੰਜਾਬੀ)“ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ”
Tamilta-INTamil (தமிழ்)“நான் சந்தைக்கு செல்கிறேன்”
Telugute-INTelugu (తెలుగు)“నేను మార్కెట్‌కి వెళ్తున్నాను”
Hinglish(Latin) (experimental)en-hi-IN-latnLatin”Main market ja raha hu”
Hinglish (experimental)en-hi-in-cmLatin + Devanagari (हिन्दी)“मैं market जा रहा हूँ”
Auto-detect (experimental)en-IN,hi-IN,ta-IN,te-IN,kn-IN,ml-IN,gu-IN,mr-IN,bn-IN,pa-INAll supportedAutomatically detects language

Python SDK

The official Python SDK lets you transcribe audio with a few lines of code, without manually constructing multipart requests or handling HTTP headers.

Installation

pip install gnani-vachana
Requires Python 3.9+.

Authentication

The REST client requires three credentials — your organization_id, api_key, and user_id. You can pass them directly or load them from environment variables.
from gnani.stt import GnaniSTTClient

client = GnaniSTTClient(
    organization_id="your-organization-id",
    api_key="your-api-key",
    user_id="your-user-id",
)

Transcribe Audio

result = client.transcribe("recording.wav", language_code="hi-IN")
print(result["transcript"])

Custom Request ID

Useful for correlating SDK calls with your own logs or support tickets.
result = client.transcribe(
    "call.flac",
    language_code="hi-IN",
    request_id="my-trace-123",
)

Error Handling

from gnani.stt import (
    AuthenticationError,
    InvalidAudioError,
    APIError,
)

try:
    result = client.transcribe("audio.wav", language_code="hi-IN")
    print(result["transcript"])
except AuthenticationError:
    print("Invalid credentials — check your organization_id, api_key, and user_id.")
except InvalidAudioError as e:
    print(f"Bad audio file: {e}")
except APIError as e:
    print(f"API error {e.status_code}: {e}")

Authorizations

X-API-Key-ID
string
header
required

API key for authentication. Sign up in Vachana to get the API Key.

Body

multipart/form-data
audio_file
file
required

Audio file to transcribe. Supported formats - WAV, MP3, OGG, FLAC, AAC, M4A. Maximum duration - 60 seconds (Ideal duration is 30 seconds).

language_code
enum<string>
required

Language code for transcription. Use one of the supported language codes.

Supported values: bn-IN, en-IN, gu-IN, hi-IN, kn-IN, ml-IN, mr-IN, pa-IN, ta-IN, te-IN, en-hi-IN-latn

Available options:
bn-IN,
en-IN,
gu-IN,
hi-IN,
kn-IN,
ml-IN,
mr-IN,
pa-IN,
ta-IN,
te-IN,
en-hi-IN-latn
Example:

"hi-IN"

preferred_language
enum<string>

Optional preferred language for processing when multiple languages are specified. Must be one of the languages in language_code. When set, forces processing with the single-language model for the specified language, which may improve accuracy for predominantly single-language audio.

Available options:
bn-IN,
en-IN,
gu-IN,
hi-IN,
kn-IN,
ml-IN,
mr-IN,
pa-IN,
ta-IN,
te-IN,
en-hi-IN-latn
Example:

"hi-IN"

Response

Successful transcription

success
boolean

Indicates if the transcription was successful

timestamp
string

Request timestamp in format YYYYMMDD_HHMMSS.mmm

transcript
string

The transcribed text from the audio