Model
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
audio_url | string | Yes | - | URL of the audio file to transcribe |
tag_audio_events | boolean | No | true | Tag audio events like laughter |
include_subtitles | boolean | No | false | Include subtitle timing |
keyterms | string | No | - | Comma-separated keywords for better accuracy |
Example - Basic Transcription
Example - With Options
Response
Completed Response
Word Object
Each word in thewords array contains:
| Field | Type | Description |
|---|---|---|
text | string | The transcribed text |
start | number | Start time in seconds |
end | number | End time in seconds |
type | string | "word", "spacing", or audio event type |
speaker_id | string | Speaker identifier for multi-speaker audio |
Pricing
| Unit | Price |
|---|---|
| Per second of audio | $0.001056 |
Notes
- Output is text data, not audio
- Supports multiple speakers with automatic speaker detection
- Use
keytermsto improve accuracy for specific words or names - Audio events (like laughter) are tagged when
tag_audio_eventsis enabled
