- Text-to-video generation (omit
start_frame_url) - Start/end frame video generation
- Multi-shot video generation (2–6 shots)
- Inline character/object elements with voice support
- AI audio generation (on by default)
- Resolutions: 720p (std), 1080p (pro), and 4K (4k — $0.30/sec)
- Aspect ratios: 1:1, 16:9, 9:16
- Duration: 3–15 seconds
Model
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Conditional | - | Text prompt. Mutually exclusive with multi_shots |
mode | string | No | "pro" | "std" (720p), "pro" (1080p), or "4k" (4K — $0.30/sec) |
duration | integer | No | 5 | Video duration in seconds (3–15) |
aspect_ratio | string | No | "16:9" | "1:1", "16:9", or "9:16". Safe to omit — server defaults to "16:9" |
native_audio | boolean | No | true | Generate AI audio from the video content |
start_frame_url | string | No | - | First frame image URL. Omit for text-to-video |
end_frame_url | string | No | - | Last frame image URL |
elements | array | No | - | Character/object elements to include in the video |
multi_shots | array | No | - | Multi-shot sequence (2–6 shots, each with prompt and duration). Mutually exclusive with prompt |
Elements
Elements are created automatically from your input, used during generation, then auto-deleted on completion.| Parameter | Type | Default | Description |
|---|---|---|---|
description | string | "" | Short description of the element (max 100 chars) |
type | string | "image" | Element source type: "image" or "video" |
image_urls | string[] | - | Source image URLs for an image element (max 4) |
video_url | string | - | Source video URL for a video element |
Multi-Shot Mode
Generate multi-scene videos with 2–6 shots. When usingmulti_shots, the prompt field is not used.
Key Points:
- Each shot must have a
promptandduration(minimum 1 second) - Total duration of all shots must be between 3–15 seconds
multi_shotsandpromptare mutually exclusive
