Stream audio in real-time and receive transcriptions as speech is detected. Best for live conversations or interactive applications. The Realtime STT is multilingual which detects language automatically. For pre-recorded audio, use STT REST.
The client sends binary WebSocket frames at a steady cadence.
Each frame must be exactly 1024 bytes (512 × 16-bit samples = 32 ms of audio).
Frames should be sent in real time; buffering or bursting may degrade VAD accuracy.
Emitted when the VAD has detected the end of a speech segment and transcription has begun. Acts as a low-latency acknowledgement that audio was captured and is being processed.
The official Python SDK wraps the WebSocket connection, audio pacing, and event parsing into a clean async interface so you can focus on your application logic.
The recommended approach — use the async context manager and the stream_audio helper. It handles real-time pacing automatically so audio is sent at the correct cadence for VAD.
import asynciofrom gnani.stt import GnaniSTTStreamClient, StreamTranscriptEventasync def main(): async with GnaniSTTStreamClient( api_key="your-api-key", language_code="hi-IN", sample_rate=16000, ) as stream: with open("audio.pcm", "rb") as f: transcripts = await stream.stream_audio( f, on_transcript=lambda t: print(f"Transcript: {t.text}"), on_processing=lambda p: print("Processing..."), realtime_pace=True, # sends frames at real-time cadence ) print(f"Total segments: {len(transcripts)}")asyncio.run(main())
If you need lower-level control — for example to handle each event type differently or interleave sending and receiving — iterate over the stream directly.
import asynciofrom gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent, StreamProcessingEventasync def main(): async with GnaniSTTStreamClient( api_key="your-api-key", language_code="hi-IN", ) as stream: # Send audio chunks with open("audio.pcm", "rb") as f: while chunk := f.read(1024): await stream.send_audio(chunk) await asyncio.sleep(0.032) # 32 ms per frame # Process events async for event in stream: if isinstance(event, StreamTranscriptEvent): print(f"[Segment {event.segment_index}] {event.text}") print(f" Duration: {event.audio_duration_ms}ms Latency: {event.latency}ms") elif isinstance(event, StreamProcessingEvent): print("Processing speech...")asyncio.run(main())
from gnani.stt import ( StreamConnectionError, StreamClosedError, StreamError,)try: async with GnaniSTTStreamClient(api_key="your-api-key") as stream: await stream.send_audio(chunk)except StreamConnectionError as e: print(f"Could not connect: {e}")except StreamClosedError as e: print(f"Stream was already closed: {e}")except StreamError as e: print(f"Server error: {e.message} (at {e.timestamp})")