Skip to main content
POST
/
stt
/
v3
Speech to Text (REST)
curl --request POST \
  --url https://api.vachana.ai/stt/v3 \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key-ID: <api-key>' \
  --form audio_file='@example-file' \
  --form language_code=hi-IN
{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

Overview

The REST endpoint transcribes audio files up to 60 seconds (Ideal duration is 30 seconds) in a single synchronous response. Ideal for batch processing or when you have pre-recorded audio. For real-time transcription, see STT Realtime.

Language Codes

The Vachana API supports these 10 Indian languages

LanguageCodeNative ScriptExample Text
Bengalibn-INBengali (বাংলা)“আমি ভাত খাই”
Englishen-INLatin”I am going to the market”
Gujaratigu-INGujarati (ગુજરાતી)“હું બજાર જાઉં છું”
Hindihi-INDevanagari (हिन्दी)“मैं बाज़ार जा रहा हूँ”
Kannadakn-INKannada (ಕನ್ನಡ)“ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ”
Malayalamml-INMalayalam (മലയാളം)“ഞാൻ ചന്തയിലേക്ക് പോകുന്നു”
Marathimr-INDevanagari (मराठी)“मी बाजारात जातोय”
Punjabipa-INGurmukhi (ਪੰਜਾਬੀ)“ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ”
Tamilta-INTamil (தமிழ்)“நான் சந்தைக்கு செல்கிறேன்”
Telugute-INTelugu (తెలుగు)“నేను మార్కెట్‌కి వెళ్తున్నాను”
Hinglish(Latin) (experimental)en-hi-IN-latnLatin”Main market ja raha hu”
Hinglish (experimental)en-hi-in-cmLatin + Devanagari (हिन्दी)“मैं market जा रहा हूँ”
Auto-detect (experimental)en-IN,hi-IN,ta-IN,te-IN,kn-IN,ml-IN,gu-IN,mr-IN,bn-IN,pa-INAll supportedAutomatically detects language

Authorizations

X-API-Key-ID
string
header
required

API key for authentication. Sign up in Vachana to get the API Key.

Body

multipart/form-data
audio_file
file
required

Audio file to transcribe. Supported formats - WAV, MP3, OGG, FLAC, AAC, M4A. Maximum duration - 60 seconds (Ideal duration is 30 seconds).

language_code
enum<string>
required

Language code for transcription. Use one of the supported language codes.

Supported values: bn-IN, en-IN, gu-IN, hi-IN, kn-IN, ml-IN, mr-IN, pa-IN, ta-IN, te-IN, en-hi-IN-latn

Available options:
bn-IN,
en-IN,
gu-IN,
hi-IN,
kn-IN,
ml-IN,
mr-IN,
pa-IN,
ta-IN,
te-IN,
en-hi-IN-latn
Example:

"hi-IN"

preferred_language
enum<string>

Optional preferred language for processing when multiple languages are specified. Must be one of the languages in language_code. When set, forces processing with the single-language model for the specified language, which may improve accuracy for predominantly single-language audio.

Available options:
bn-IN,
en-IN,
gu-IN,
hi-IN,
kn-IN,
ml-IN,
mr-IN,
pa-IN,
ta-IN,
te-IN,
en-hi-IN-latn
Example:

"hi-IN"

Response

Successful transcription

success
boolean

Indicates if the transcription was successful

timestamp
string

Request timestamp in format YYYYMMDD_HHMMSS.mmm

transcript
string

The transcribed text from the audio