Constraints
- Supported formats:
wav,ogg,webm - Max file size: 50 MB
- Max duration: 300 seconds (default — can be raised on request)
Make a request
Attach your audio file as a multipartaudio field and specify which model to use. Make sure your API key is set in the environment first (see Authentication).
Response
A successful request returns a JSON object containing the transcript, detected language, and billing usage:Choosing a language
Thelanguage parameter is optional. Pass an ISO 639-1 code — for example en for English or zh for Mandarin Chinese — to give the model a hint that improves accuracy, especially for shorter clips or accented speech. If you omit it, the API detects the language automatically and reports the result in detected_language on every response.
Choosing a model
Themodel field is required. Three tiers are available:
| Model | Tier |
|---|---|
typeless-asr-l1-v1 | Economy — lowest cost, but slower and slightly lower quality |
typeless-asr-l2-v1 | Recommended default — balanced quality, speed, and cost for most use cases |
typeless-asr-l3-v1 | Best — highest accuracy and fastest turnaround; ideal for noisy audio, strong accents, or technical vocabulary |
Timeouts
The request is synchronous: your HTTP connection stays open until transcription is complete. Internally, the ASR stage has a 90-second timeout and the refinement stage has a 60-second timeout. If either limit is exceeded, the API returnsSERVICE_UNAVAILABLE (503) and you are not charged for that request.
Billing is per second of audio with a 15-second minimum per request. See Models & Pricing for rates.