Model
Modes
| Mode | Description |
|---|---|
t2v | Generate video from a text prompt. Multi-shot via natural language. |
i2v | Generate video from a first frame, first + last frames, or continue an existing video. Optional driving audio. |
r2v | Generate video from reference images/videos. Use “Image 1”/“Video 1” identifiers in prompt. |
Common Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
mode | string | No | "t2v" | "t2v", "i2v", or "r2v" |
prompt | string | Depends | "" | Text prompt (max 5000 chars). Required for T2V and R2V |
negative_prompt | string | No | "" | What to avoid (max 500 chars) |
resolution | string | No | "1080P" | "720P" or "1080P" |
duration | integer | No | 5 | Output duration in seconds (2-15) |
prompt_extend | boolean | No | true | LLM-based prompt rewriting. Improves short prompts but adds latency |
watermark | boolean | No | false | Add “AI Generated” watermark in lower-right corner |
seed | integer | No | null | Random seed (0-2147483647) |
T2V Mode
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
audio_url | string | No | null | Audio file URL. Formats: WAV, MP3. Duration: 2-30s. Max 15MB. Truncated to duration if longer |
ratio | string | No | "16:9" | Aspect ratio: "16:9", "9:16", "1:1", "4:3", "3:4" |
Multi-shot
Control shot structure using natural language in the prompt:- Single shot: “Generate a single-shot video”
- Multi-shot: “Generate a multi-shot video” or describe shots with timestamps (e.g., “Shot 1 [0-3 seconds] wide shot: Rainy New York street at night”)
- Default: If unspecified, the model interprets the prompt content
Example - Basic Text-to-Video
Example - Multi-shot Narrative
Example - With Audio File
I2V Mode
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
first_frame_url | string | Depends | null | First frame image URL. Formats: JPEG, JPG, PNG, BMP, WEBP. Resolution: 240-8000px. Max 20MB |
last_frame_url | string | No | null | Last frame image URL. Same format limits. Requires first_frame_url |
video_url | string | Depends | null | Video to continue (MP4/MOV, 2-10s, max 100MB). Mutually exclusive with first_frame_url/last_frame_url |
audio_url | string | No | null | Driving audio URL for lip-sync / action timing. Formats: WAV, MP3. Duration: 2-30s. Max 15MB |
Validation Rules
- At least one of
first_frame_urlorvideo_urlis required last_frame_urlrequiresfirst_frame_urlvideo_url(continuation) is mutually exclusive withfirst_frame_url/last_frame_url
Example - First Frame to Video
Example - First + Last Frame
Example - Video Continuation
Continue an existing video clip. If the input is 3s andduration is 15, the model generates 12s of new content. The final output is 15s.
R2V Mode
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
reference_image_urls | string[] | Depends | null | Reference images for characters/objects/scenes. Each must contain a single subject. Formats: JPEG, JPG, PNG, BMP, WEBP. Max 20MB each |
reference_video_urls | string[] | Depends | null | Reference videos for characters. Each must contain a single subject. Formats: MP4, MOV. Duration: 1-30s. Max 100MB each |
first_frame_url | string | No | null | First frame image for scene control. Overrides ratio |
reference_voice_url | string | No | null | Audio URL for voice timbre override. Formats: WAV, MP3. Duration: 1-10s. Max 15MB |
ratio | string | No | "16:9" | Aspect ratio: "16:9", "9:16", "1:1", "4:3", "3:4". Ignored when first_frame_url is set |
Prompt Identifiers
Use “Image 1”, “Image 2” to reference images and “Video 1”, “Video 2” to reference videos. Images and videos are numbered separately based on array order. If there is only one reference, you can use “the reference image” or “the reference video”.Validation Rules
- At least 1
reference_image_urlsorreference_video_urlsis required - Total images + videos must not exceed 5
Example - Multi-Reference
Example - Single Reference Image
Input Limits
| Asset | Formats | Resolution | Duration | File Size |
|---|---|---|---|---|
| Audio (T2V / I2V driving) | WAV, MP3 | — | 2-30s | 15MB |
| Voice timbre (R2V) | WAV, MP3 | — | 1-10s | 15MB |
| First/last frame image | JPEG, JPG, PNG, BMP, WEBP | 240-8000px, ratio 1:8 to 8:1 | — | 20MB |
| Input video (I2V continuation) | MP4, MOV | 240-4096px, ratio 1:8 to 8:1 | 2-10s | 100MB |
| Reference video (R2V) | MP4, MOV | 240-4096px, ratio 1:8 to 8:1 | 1-30s | 100MB |
| Reference image (R2V) | JPEG, JPG, PNG, BMP, WEBP | 240-8000px, ratio 1:8 to 8:1 | — | 20MB |
