Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inya.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Submit one or more audio files for transcription and receive a job_id immediately. Poll the status endpoint on a fixed interval until the job completes and transcripts are available. Ideal for long recordings, bulk files, or offline pipelines where you do not need a live response. For real-time transcription, see STT Realtime. For short clips under 60 seconds, see STT REST.

Endpoints

OperationMethodURL
Submit JobPOSThttps://api.vachana.ai/stt/v3/batch/submit
Check Job StatusGEThttps://api.vachana.ai/stt/v3/batch/status/{job_id}

Limits & Specifications

View limits and supported formats

ItemLimit
Max audio durationLess than 1 hour per file
Max files per request10 files per API call
Max total payload size80 MB across all files and form fields combined
Minimum poll interval60 seconds between status calls for the same job_id
Speaker diarization.This API supports at most 2 speakers per file (two-party diarization). Scenarios with more than two distinct speakers are not supported

Supported Audio Formats

AAC · WAV · FLAC · ALAC · OGG (Vorbis) · OpusUse standard file extensions and MIME types (e.g. .m4a for AAC, .wav, .flac, .ogg).

Authentication

Send these headers on every request both submit and status calls.
HeaderRequiredDescription
X-API-Key-IDYesYour API key. Required for all requests.
X-API-Request-IDNoA unique trace ID (e.g. UUID) you assign. Used to correlate your logs with platform logs or support. If omitted, the platform may generate one.
Do not set Content-Type: application/json on the submit request. Use multipart/form-data. curl sets the correct boundary automatically when you use -F / --form.

Submit a Transcription Job

POST /stt/v3/batch/submit

Upload audio files and kick off an asynchronous transcription job. The response returns a job_id immediately. the files are not yet transcribed at this point.

Request — Form Fields

FieldRequiredTypeDescription
audio_filesYesfileAudio files to transcribe. Add one audio_files field per file. Accepts 1–10 files, each under 1 hour. Formats: AAC, WAV, FLAC, ALAC, OGG, Opus. Total body must not exceed 80 MB.
language_codeYesstringBCP-47 language code for transcription (e.g. hi-IN, en-IN). See supported language codes.
is_multi_channelNobooleanSet to true if the audio is multi-channel (e.g. stereo or per-speaker tracks). Set to false for standard mono audio. Defaults to false.
formatNostringOutput format for transcripts. Set to transcribe to enable Inverse Text Normalization (ITN) — numbers, currency, dates, and phone numbers are converted to written form. Set to verbatim for raw spoken-form output. Defaults to verbatim. Currently supported for hi-IN and en-IN only.
itn_native_numeralsNobooleanWhen format=transcribe, set to true to render digits in the native script of the target language (e.g. Devanagari numerals for Hindi). Has no effect when format=verbatim. Defaults to false. See the ITN section below for full details.

Supported Language Codes

The Vachana API supports these 10 Indian languages

LanguageCodeNative ScriptExample Text
Bengalibn-INBengali (বাংলা)“আমি ভাত খাই”
Englishen-INLatin”I am going to the market”
Gujaratigu-INGujarati (ગુજરાતી)“હું બજાર જાઉં છું”
Hindihi-INDevanagari (हिन्दी)“मैं बाज़ार जा रहा हूँ”
Kannadakn-INKannada (ಕನ್ನಡ)“ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ”
Malayalamml-INMalayalam (മലയാളം)“ഞാൻ ചന്തയിലേക്ക് പോകുന്നു”
Marathimr-INDevanagari (मराठी)“मी बाजारात जातोय”
Punjabipa-INGurmukhi (ਪੰਜਾਬੀ)“ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ”
Tamilta-INTamil (தமிழ்)“நான் சந்தைக்கு செல்கிறேன்”
Telugute-INTelugu (తెలుగు)“నేను మార్కెట్‌కి వెళ్తున్నాను”

Example — curl

curl --location --request POST 'https://api.vachana.ai/stt/v3/batch/submit' \
  --header 'X-API-Key-ID: <YOUR_API_KEY>' \
  --header 'X-API-Request-ID: 550e8400-e29b-41d4-a716-446655440000' \
  --form 'language_code=hi-IN' \
  --form 'is_multi_channel=false' \
  --form 'format=transcribe' \
  --form 'audio_files=@"/path/to/first.wav"' \
  --form 'audio_files=@"/path/to/second.wav"'

Response — 200 OK

{
  "job_id": "batch_7f3a92c1d4e8",
  "status": "submitted",
  "file_count": 2,
  "message": "Job accepted. Poll the status endpoint every 60 seconds for results."
}
FieldTypeDescription
job_idstringIdentifier for this job. Use it in the status URL.
statusstringInitial value is always submitted.
file_countintegerNumber of files accepted into the job.
messagestringShort confirmation with polling instructions.

Errors

HTTP StatusWhen
400No files uploaded, empty file, more than 10 files, payload over 80 MB, unsupported format, or other client-side validation failure.
500Server error.

Check Job Status

GET /stt/v3/batch/status/{job_id}

Poll this endpoint to check progress and retrieve transcription results once the job finishes. Call this once every 60 seconds per job_id. do not poll more frequently.

Path Parameter

ParameterRequiredDescription
job_idYesThe job_id returned from the Submit response.

Example — curl

curl --location --request GET 'https://api.vachana.ai/stt/v3/batch/status/{job_id}' \
  --header 'X-API-Key-ID: <YOUR_API_KEY>' \
  --header 'X-API-Request-ID: 550e8400-e29b-41d4-a716-446655440000'

Response — 200 OK

{
  "job_id": "batch_7f3a92c1d4e8",
  "status": "completed",
  "total_files": 2,
  "completed_files": 2,
  "failed_files": 0,
  "overall_progress": 100,
  "error": null,
  "results": [
    {
      "filename": "first.wav",
      "status": "completed",
      "full_transcript": "नमस्ते, आप कैसे हैं?",
      "total_duration": 45.3,
      "error": null,
      "segments": [
        {
          "segment_id": 0,
          "start_time": 0.0,
          "end_time": 3.2,
          "text": "नमस्ते, आप कैसे हैं?",
          "speaker_id": 1,
          "confidence": 0.97,
          "language_detected": "hi-IN",
          "sentiment": "Neutral",
          "emotion": "Neutral"
        }
      ]
    }
  ]
}

Job-Level Response Fields

FieldTypeDescription
job_idstringJob identifier.
statusstringsubmitted — accepted or in progress. processing — actively transcribing. completed — done. failed — job-level failure.
total_filesintegerTotal number of files in the job.
completed_filesintegerFiles finished successfully. Meaningful only when the job has reached a final state.
failed_filesintegerFiles that failed. Meaningful only when the job has reached a final state.
overall_progressintegerApproximate progress from 0 to 100 while the job is running.
resultsarray or nullPer-file results. null while the job is submitted or processing.
errorstring or nullTop-level error message for the job, if any.

Per-File Result Fields — results[]

FieldTypeDescription
filenamestringOriginal file name as submitted.
full_transcriptstringComplete transcribed text for the file.
segmentsarrayTime-aligned transcript segments (see below).
total_durationnumberAudio duration in seconds.
statusstringcompleted or failed for this individual file.
errorstring or nullError message for this file if it failed.

Per-Segment Fields — results[].segments[]

FieldTypeDescription
segment_idintegerSegment index (zero-based).
start_timenumberSegment start time in seconds.
end_timenumberSegment end time in seconds.
textstringTranscribed text for this segment.
speaker_idintegerSpeaker identifier. Populated for multi-channel audio.
confidencenumberConfidence score for the segment transcript (0–1).
language_detectedstringBCP-47 code of the detected language for this segment.
sentimentstringSentiment label (e.g. Neutral, Positive, Negative).
emotionstringEmotion label (e.g. Neutral, Happy, Sad).

Errors

HTTP StatusWhen
404job_id not found — unknown ID or the job is no longer available.
500Server error.

Inverse Text Normalization (ITN)

When format=transcribe is passed in the form body, ITN runs on every file’s transcript after recognition — converting spoken-form numbers, currency, dates, times, and phone numbers into the compact written form a reader expects.
ITN is currently supported for Hindi (hi-IN) and English (en-IN) only. Enabling ITN for other languages has no effect — transcripts are returned as-is.

What ITN Normalizes

ITN recognizes six categories of spoken expressions. Every matching span is transformed; all other words pass through unchanged.
Whole numbers and positional ranks are formatted using Indian comma grouping (groups of 2 after the first 3 digits).
Spoken input (ASR)Written output (ITN)Format rule
दो हज़ार2,000Indian comma grouping
पाँच लाख बीस हज़ार5,20,000Lakh-scale grouping
five lakh5,00,000English lakh convention
पहला / twenty first1st / 21stOrdinal suffix
All Indian currency expressions — including paise fractions and lakh/crore scales — are formatted with the ₹ symbol and Indian comma grouping.
Spoken input (ASR)Written output (ITN)Format rule
पाँच सौ रुपये₹500₹ + amount
तीन रुपये पचास पैसे₹3.50₹ + rupees.paise
I need five thousand rupees₹5,000English India pipeline
pay do lakh rupees₹2,00,000Code-mixed en/hi
Spoken input (ASR)Written output (ITN)Format rule
बीस जनवरी दो हज़ार पच्चीस20 जनवरी 2025DD Month YYYY (hi)
fifteenth january twenty twenty five15th January 2025Ordinal Month YYYY (en)
Indian time-of-day words (सुबह, दोपहर, शाम, रात) automatically map to 24-hour HH:MM output.
Spoken input (ASR)Written output (ITN)Format rule
सुबह पाँच बजेसुबह 05:00सुबह = AM
शाम पाँच बजेशाम 17:00शाम = evening (16–20h)
रात के दस बजेरात 22:00रात = night (20–24h)
meeting at five fifteen in the eveningmeeting 17:15 in the eveningen — 24-hour
10-digit streams → mobile number; 6-digit streams → PIN. Repeat prefixes (double/डबल) are expanded.
Spoken input (ASR)Written output (ITN)Format rule
नौ आठ सात छह पाँच चार तीन दो एक शून्य987654321010 digits → phone
एक एक शून्य शून्य शून्य एक1100016 digits → PIN
one two three four five six123456English digit words
A single file may contain segments with multiple entity types or blend Hindi and English. ITN normalizes each entity independently in one pass.
Spoken input (ASR)Written output (ITN)
कल थ्री फिफ्टी पीएम को पाँच सौ रुपये transfer करना हैकल 15:50 को ₹500 transfer करना है
pay do lakh rupees by fifteenth marchpay ₹2,00,000 by 15th March

Native Script Digits — itn_native_numerals

By default, ITN outputs Western Arabic digits (0–9) regardless of language. When format=transcribe is set, you can additionally pass itn_native_numerals=true to render digits in the native script of the target language.
LanguageSpoken inputfalse (default)true — native script
Hindi hi-INपाँच हज़ार रुपये₹5,000₹५,०००
English en-INfive thousand rupees₹5,000₹5,000 (Latin — no change)
English always outputs Western Arabic digits. itn_native_numerals=true has no effect for en-IN.

What ITN Does Not Change

ITN intentionally preserves idiomatic and ambiguous phrases.
  • दो तीन (meaning a few) stays as text, not 2 or 3
  • कर दो / ले दो (imperative verbs) are kept as words, not treated as cardinal 2

Flow Summary

  1. SubmitPOST https://api.vachana.ai/stt/v3/batch/submit with X-API-Key-ID, optional X-API-Request-ID, and form fields audio_files, language_code, and optionally is_multi_channel, format, and itn_native_numerals.
  2. Save the job_id from the submit response.
  3. PollGET https://api.vachana.ai/stt/v3/batch/status/{job_id} (same auth headers) every 60 seconds until status is completed or failed and results is populated.