Streaming transcription is designed for scenarios where audio is captured in real time: live dictation, call transcription, and real-time captioning pipelines. Rather than waiting until a recording is complete, you open a WebSocket connection and push audio frames as they arrive.
The session lifecycle is straightforward:
- Connect — authenticate via query parameters
- Send binary PCM frames — push raw audio as it is captured
- Send
close_stream — signal that you are done sending audio
- Receive
result — the server returns the final transcript and closes the connection
Connection
Connect to the following URL, passing authentication and stream parameters as query parameters:
wss://api.typelessapi.com/v1/transcribe/stream
| Parameter | Type | Default | Description |
|---|
token | string | required | Your API key |
model | string | required | Model tier — see Models & Pricing |
sample_rate | integer | 16000 | 8000–48000 Hz |
channels | integer | 1 | 1 or 2 |
encoding | string | pcm16 | Only pcm16 (16-bit little-endian PCM) is supported |
language | string | auto-detect | Optional ISO 639-1 hint |
Connection lifecycle
You must send the first audio frame within 10 seconds of connecting — keep_alive messages do not extend this deadline.
After that, the connection closes after 60 seconds of silence; any audio frame or a keep_alive message resets the idle timer.
Example
import asyncio
import json
import os
import websockets
async def main():
url = (
"wss://api.typelessapi.com/v1/transcribe/stream"
f"?token={os.environ['TYPELESS_API_KEY']}"
"&model=typeless-asr-l2-v1"
"&sample_rate=16000&channels=1&encoding=pcm16"
)
async with websockets.connect(url) as ws:
# meeting.pcm: raw 16 kHz mono little-endian pcm16
with open("meeting.pcm", "rb") as f:
while chunk := f.read(3200): # 100 ms per frame
await ws.send(chunk)
await asyncio.sleep(0.1)
await ws.send(json.dumps({"type": "close_stream"}))
async for message in ws:
event = json.loads(message)
if event["type"] == "result":
print(event["result"]["transcript"])
break
asyncio.run(main())
Getting the result
Once you send close_stream, the server finishes processing all buffered audio and then delivers a single result message containing the complete transcript. Finalization typically completes within a few seconds, and can take up to about a minute when a large amount of audio is still being processed. The connection is closed by the server immediately after.
If you disconnect or cancel after transcription has started, you are still billed for the audio you sent (15-second minimum applies).
For the complete message protocol, see WS /v1/transcribe/stream reference.