- Text-to-video generation
- Image-to-video generation (start frame, or start + end frames)
- Multi-shot video generation (2–6 shots)
- Audio generation (on by default)
- Resolutions: 720p (std) and 1080p (pro)
- Aspect ratios: 1:1, 16:9, 9:16
- Duration: 3–15 seconds
Model
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes* | - | Text description. *Required if no multi_shot |
start_image_url | string | No | - | URL of the starting frame image |
end_image_url | string | No | - | URL of the ending frame image |
duration | integer | No | 5 | Video duration in seconds (3–15) |
mode | string | No | "std" | "std" (720p) or "pro" (1080p) |
aspect_ratio | string | No | "16:9" | "1:1", "16:9", or "9:16". Text-to-video only (image-to-video derives from image) |
audio | boolean | No | true | Generate audio for the output video |
multi_shot | Shot[] | No | - | 2–6 shots, total duration 3–15s. Mutually exclusive with prompt |
{ "prompt": string, "duration": int (1–15) }
Multi-Shot Mode
Generate multi-scene videos with 2–6 shots. When usingmulti_shot, the prompt field is not used.
Example Request:
- Each shot must have a
promptandduration(minimum 1 second) - Total duration of all shots must be between 3–15 seconds
multi_shotandpromptare mutually exclusive
Examples
Example 1: Text-to-Video
Example 2: Image-to-Video with Start and End Frames
Example 3: Multi-Shot
Response
Pricing
| Mode | Price |
|---|---|
std (720p) | $0.050/s |
pro (1080p) | $0.070/s |
