Skip to main content
POST
/
stt
/
v3
Speech to Text Decode
curl --request POST \
  --url https://api.vachana.ai/stt/v3 \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key-ID: <api-key>' \
  --header 'X-API-Request-ID: <x-api-request-id>' \
  --header 'X-API-User-ID: <x-api-user-id>' \
  --header 'X-Organization-ID: <x-organization-id>' \
  --form audio_file='@example-file' \
  --form language_code=hi-IN
{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

Model Selection Logic

The API uses two types of models, selected automatically based on your request:

Monolingual Model

Selected when:
  • You specify exactly one language in the language parameter
  • You want highest accuracy for a single language
Characteristics:
  • Optimized for single-language audio
  • Faster processing time
  • Best for scenarios where language is known beforehand
Example: language=hi-IN → Uses Hindi monolingual model

Multilingual Model

Selected when:
  • You specify multiple languages: language=hi-IN,en-IN or language=auto-detect
  • You want the API to detect which language(s) are spoken
  • Audio may contain code-switching between languages
Characteristics:
  • Handles multiple languages in a single request
  • Automatic language detection within specified options
  • Can transcribe code-switched speech (e.g., Hindi-English mix)
  • Best for unpredictable multilingual scenarios
Example: language=hi-IN,en-IN → Uses multilingual model, detects Hindi/English

Language Codes

The Vachana API supports these 10 Indian languages

LanguageCodeNative ScriptExample Text
Bengalibn-INBengali (বাংলা)“আমি ভাত খাই”
Englishen-INLatin”I am going to the market”
Gujaratigu-INGujarati (ગુજરાતી)“હું બજાર જાઉં છું”
Hindihi-INDevanagari (हिन्दी)“मैं बाज़ार जा रहा हूँ”
Kannadakn-INKannada (ಕನ್ನಡ)“ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ”
Malayalamml-INMalayalam (മലയാളം)“ഞാൻ ചന്തയിലേക്ക് പോകുന്നു”
Marathimr-INDevanagari (मराठी)“मी बाजारात जातोय”
Punjabipa-INGurmukhi (ਪੰਜਾਬੀ)“ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ”
Tamilta-INTamil (தமிழ்)“நான் சந்தைக்கு செல்கிறேன்”
Telugute-INTelugu (తెలుగు)“నేను మార్కెట్‌కి వెళ్తున్నాను”
Auto-detecten-IN,hi-IN,ta-IN,te-IN,kn-IN,ml-IN,gu-IN,mr-IN,bn-IN,pa-INAll supportedAutomatically detects language

Auto-detect Mode: To enable automatic language detection across all supported languages, pass all language codes as a comma-separated list.

Authorizations

X-API-Key-ID
string
header
required

API key for authentication. Contact Gnani.ai to obtain your API key.

Headers

X-API-Request-ID
string
required

Unique request ID for tracking and logging.

Example:

"req_abc123"

X-API-User-ID
string
required

User identifier for tracking and analytics.

Example:

"company-name"

X-Organization-ID
string
required

Organization identifier for multi-tenant setups.

Example:

"org_mycompany"

Body

multipart/form-data
audio_file
file
required

Audio file to transcribe. Supported formats - WAV, MP3, OGG, FLAC, AAC, M4A. Maximum duration - 30 seconds.

language_code
string
required

Language code for transcription. Use one of the supported language codes.

Supported values: bn-IN, en-IN, gu-IN, hi-IN, kn-IN, ml-IN, mr-IN, pa-IN, ta-IN, te-IN

For multilingual transcription, use comma-separated values (e.g., en-IN,hi-IN).

Example:

"hi-IN"

preferred_language
enum<string>

Optional preferred language for processing when multiple languages are specified. Must be one of the languages in language_code. When set, the monolingual model for this language will be used.

Available options:
bn-IN,
en-IN,
gu-IN,
hi-IN,
kn-IN,
ml-IN,
mr-IN,
pa-IN,
ta-IN,
te-IN
Example:

"hi-IN"

Response

Successful transcription

success
boolean

Indicates if the transcription was successful

request_id
string

Unique identifier for this request

timestamp
string

Request timestamp in format YYYYMMDD_HHMMSS.mmm

transcript
string

The transcribed text from the audio