Documentation Index
Fetch the complete documentation index at: https://docs.inya.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Submit one or more audio files for transcription and receive ajob_id immediately. Poll the status endpoint on a fixed interval until the job completes and transcripts are available. Ideal for long recordings, bulk files, or offline pipelines where you do not need a live response. For real-time transcription, see STT Realtime. For short clips under 60 seconds, see STT REST.
Endpoints
| Operation | Method | URL |
|---|---|---|
| Submit Job | POST | https://api.vachana.ai/stt/v3/batch/submit |
| Check Job Status | GET | https://api.vachana.ai/stt/v3/batch/status/{job_id} |
Limits & Specifications
View limits and supported formats
View limits and supported formats
| Item | Limit |
|---|---|
| Max audio duration | Less than 1 hour per file |
| Max files per request | 10 files per API call |
| Max total payload size | 80 MB across all files and form fields combined |
| Minimum poll interval | 60 seconds between status calls for the same job_id |
| Speaker diarization. | This API supports at most 2 speakers per file (two-party diarization). Scenarios with more than two distinct speakers are not supported |
Supported Audio Formats
AAC · WAV · FLAC · ALAC · OGG (Vorbis) · OpusUse standard file extensions and MIME types (e.g. .m4a for AAC, .wav, .flac, .ogg).Authentication
Send these headers on every request both submit and status calls.| Header | Required | Description |
|---|---|---|
X-API-Key-ID | Yes | Your API key. Required for all requests. |
X-API-Request-ID | No | A unique trace ID (e.g. UUID) you assign. Used to correlate your logs with platform logs or support. If omitted, the platform may generate one. |
Submit a Transcription Job
POST /stt/v3/batch/submit
Upload audio files and kick off an asynchronous transcription job. The response returns a job_id immediately. the files are not yet transcribed at this point.
Request — Form Fields
| Field | Required | Type | Description |
|---|---|---|---|
audio_files | Yes | file | Audio files to transcribe. Add one audio_files field per file. Accepts 1–10 files, each under 1 hour. Formats: AAC, WAV, FLAC, ALAC, OGG, Opus. Total body must not exceed 80 MB. |
language_code | Yes | string | BCP-47 language code for transcription (e.g. hi-IN, en-IN). See supported language codes. |
is_multi_channel | No | boolean | Set to true if the audio is multi-channel (e.g. stereo or per-speaker tracks). Set to false for standard mono audio. Defaults to false. |
format | No | string | Output format for transcripts. Set to transcribe to enable Inverse Text Normalization (ITN) — numbers, currency, dates, and phone numbers are converted to written form. Set to verbatim for raw spoken-form output. Defaults to verbatim. Currently supported for hi-IN and en-IN only. |
itn_native_numerals | No | boolean | When format=transcribe, set to true to render digits in the native script of the target language (e.g. Devanagari numerals for Hindi). Has no effect when format=verbatim. Defaults to false. See the ITN section below for full details. |
Supported Language Codes
The Vachana API supports these 10 Indian languages
The Vachana API supports these 10 Indian languages
| Language | Code | Native Script | Example Text |
|---|---|---|---|
| Bengali | bn-IN | Bengali (বাংলা) | “আমি ভাত খাই” |
| English | en-IN | Latin | ”I am going to the market” |
| Gujarati | gu-IN | Gujarati (ગુજરાતી) | “હું બજાર જાઉં છું” |
| Hindi | hi-IN | Devanagari (हिन्दी) | “मैं बाज़ार जा रहा हूँ” |
| Kannada | kn-IN | Kannada (ಕನ್ನಡ) | “ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ” |
| Malayalam | ml-IN | Malayalam (മലയാളം) | “ഞാൻ ചന്തയിലേക്ക് പോകുന്നു” |
| Marathi | mr-IN | Devanagari (मराठी) | “मी बाजारात जातोय” |
| Punjabi | pa-IN | Gurmukhi (ਪੰਜਾਬੀ) | “ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ” |
| Tamil | ta-IN | Tamil (தமிழ்) | “நான் சந்தைக்கு செல்கிறேன்” |
| Telugu | te-IN | Telugu (తెలుగు) | “నేను మార్కెట్కి వెళ్తున్నాను” |
Example — curl
Response — 200 OK
| Field | Type | Description |
|---|---|---|
job_id | string | Identifier for this job. Use it in the status URL. |
status | string | Initial value is always submitted. |
file_count | integer | Number of files accepted into the job. |
message | string | Short confirmation with polling instructions. |
Errors
| HTTP Status | When |
|---|---|
400 | No files uploaded, empty file, more than 10 files, payload over 80 MB, unsupported format, or other client-side validation failure. |
500 | Server error. |
Check Job Status
GET /stt/v3/batch/status/{job_id}
Poll this endpoint to check progress and retrieve transcription results once the job finishes. Call this once every 60 seconds per job_id. do not poll more frequently.
Path Parameter
| Parameter | Required | Description |
|---|---|---|
job_id | Yes | The job_id returned from the Submit response. |
Example — curl
Response — 200 OK
Job-Level Response Fields
| Field | Type | Description |
|---|---|---|
job_id | string | Job identifier. |
status | string | submitted — accepted or in progress. processing — actively transcribing. completed — done. failed — job-level failure. |
total_files | integer | Total number of files in the job. |
completed_files | integer | Files finished successfully. Meaningful only when the job has reached a final state. |
failed_files | integer | Files that failed. Meaningful only when the job has reached a final state. |
overall_progress | integer | Approximate progress from 0 to 100 while the job is running. |
results | array or null | Per-file results. null while the job is submitted or processing. |
error | string or null | Top-level error message for the job, if any. |
Per-File Result Fields — results[]
| Field | Type | Description |
|---|---|---|
filename | string | Original file name as submitted. |
full_transcript | string | Complete transcribed text for the file. |
segments | array | Time-aligned transcript segments (see below). |
total_duration | number | Audio duration in seconds. |
status | string | completed or failed for this individual file. |
error | string or null | Error message for this file if it failed. |
Per-Segment Fields — results[].segments[]
| Field | Type | Description |
|---|---|---|
segment_id | integer | Segment index (zero-based). |
start_time | number | Segment start time in seconds. |
end_time | number | Segment end time in seconds. |
text | string | Transcribed text for this segment. |
speaker_id | integer | Speaker identifier. Populated for multi-channel audio. |
confidence | number | Confidence score for the segment transcript (0–1). |
language_detected | string | BCP-47 code of the detected language for this segment. |
sentiment | string | Sentiment label (e.g. Neutral, Positive, Negative). |
emotion | string | Emotion label (e.g. Neutral, Happy, Sad). |
Errors
| HTTP Status | When |
|---|---|
404 | job_id not found — unknown ID or the job is no longer available. |
500 | Server error. |
Inverse Text Normalization (ITN)
Whenformat=transcribe is passed in the form body, ITN runs on every file’s transcript after recognition — converting spoken-form numbers, currency, dates, times, and phone numbers into the compact written form a reader expects.
ITN is currently supported for Hindi (
hi-IN) and English (en-IN) only. Enabling ITN for other languages has no effect — transcripts are returned as-is.What ITN Normalizes
ITN recognizes six categories of spoken expressions. Every matching span is transformed; all other words pass through unchanged.1 — Cardinal & Ordinal Numbers
1 — Cardinal & Ordinal Numbers
Whole numbers and positional ranks are formatted using Indian comma grouping (groups of 2 after the first 3 digits).
| Spoken input (ASR) | Written output (ITN) | Format rule |
|---|---|---|
| दो हज़ार | 2,000 | Indian comma grouping |
| पाँच लाख बीस हज़ार | 5,20,000 | Lakh-scale grouping |
| five lakh | 5,00,000 | English lakh convention |
| पहला / twenty first | 1st / 21st | Ordinal suffix |
2 — Currency & Money
2 — Currency & Money
All Indian currency expressions — including paise fractions and lakh/crore scales — are formatted with the ₹ symbol and Indian comma grouping.
| Spoken input (ASR) | Written output (ITN) | Format rule |
|---|---|---|
| पाँच सौ रुपये | ₹500 | ₹ + amount |
| तीन रुपये पचास पैसे | ₹3.50 | ₹ + rupees.paise |
| I need five thousand rupees | ₹5,000 | English India pipeline |
| pay do lakh rupees | ₹2,00,000 | Code-mixed en/hi |
3 — Dates
3 — Dates
| Spoken input (ASR) | Written output (ITN) | Format rule |
|---|---|---|
| बीस जनवरी दो हज़ार पच्चीस | 20 जनवरी 2025 | DD Month YYYY (hi) |
| fifteenth january twenty twenty five | 15th January 2025 | Ordinal Month YYYY (en) |
4 — Times
4 — Times
Indian time-of-day words (सुबह, दोपहर, शाम, रात) automatically map to 24-hour HH:MM output.
| Spoken input (ASR) | Written output (ITN) | Format rule |
|---|---|---|
| सुबह पाँच बजे | सुबह 05:00 | सुबह = AM |
| शाम पाँच बजे | शाम 17:00 | शाम = evening (16–20h) |
| रात के दस बजे | रात 22:00 | रात = night (20–24h) |
| meeting at five fifteen in the evening | meeting 17:15 in the evening | en — 24-hour |
5 — Phone Numbers & PIN Codes
5 — Phone Numbers & PIN Codes
10-digit streams → mobile number; 6-digit streams → PIN. Repeat prefixes (double/डबल) are expanded.
| Spoken input (ASR) | Written output (ITN) | Format rule |
|---|---|---|
| नौ आठ सात छह पाँच चार तीन दो एक शून्य | 9876543210 | 10 digits → phone |
| एक एक शून्य शून्य शून्य एक | 110001 | 6 digits → PIN |
| one two three four five six | 123456 | English digit words |
6 — Mixed & Code-Mixed Utterances
6 — Mixed & Code-Mixed Utterances
A single file may contain segments with multiple entity types or blend Hindi and English. ITN normalizes each entity independently in one pass.
| Spoken input (ASR) | Written output (ITN) |
|---|---|
| कल थ्री फिफ्टी पीएम को पाँच सौ रुपये transfer करना है | कल 15:50 को ₹500 transfer करना है |
| pay do lakh rupees by fifteenth march | pay ₹2,00,000 by 15th March |
Native Script Digits — itn_native_numerals
By default, ITN outputs Western Arabic digits (0–9) regardless of language. When format=transcribe is set, you can additionally pass itn_native_numerals=true to render digits in the native script of the target language.
| Language | Spoken input | false (default) | true — native script |
|---|---|---|---|
Hindi hi-IN | पाँच हज़ार रुपये | ₹5,000 | ₹५,००० |
English en-IN | five thousand rupees | ₹5,000 | ₹5,000 (Latin — no change) |
English always outputs Western Arabic digits.
itn_native_numerals=true has no effect for en-IN.What ITN Does Not Change
ITN intentionally preserves idiomatic and ambiguous phrases.- दो तीन (meaning a few) stays as text, not
2or3 - कर दो / ले दो (imperative verbs) are kept as words, not treated as cardinal 2
Flow Summary
- Submit —
POST https://api.vachana.ai/stt/v3/batch/submitwithX-API-Key-ID, optionalX-API-Request-ID, and form fieldsaudio_files,language_code, and optionallyis_multi_channel,format, anditn_native_numerals. - Save the
job_idfrom the submit response. - Poll —
GET https://api.vachana.ai/stt/v3/batch/status/{job_id}(same auth headers) every 60 seconds untilstatusiscompletedorfailedandresultsis populated.