Unified Tasks API
Getting Started
Unified Tasks API
Create and manage AI generation tasks with a unified interface
POST
Unified Tasks API
The
Character items must be objects with
Start/end frame support is now available. For now, only start frame control (
Limits: one video reference, up to 3 character references, up to 7 total video + image + character references, uploaded source video up to 1 GB and up to 30 seconds.
Rejected combinations:
Duration is locked to the input video length.
/v1/tasks endpoint provides a unified interface for all AI generation models (video, image, audio).
This is the unified API for all AI generation models. Use this endpoint for all integrations.
Create Task
POST/v1/tasks
Creates a new generation task for any supported model.
Request
Request Parameters
Model identifier in
provider/model-name format. See Available Models below.Model-specific input parameters. See Input Parameters for details.
Optional webhook URL. When provided, the API sends a POST request to this URL when the task completes or fails. See Webhooks & Callbacks for payload formats and details.
Optional. When set to
true, the request is validated and the cost is calculated without actually creating a task or deducting from your balance. Useful for previewing the price of a request before committing.Response
Dry Run
To check the cost of a request without creating a task or deducting from your balance, setdry_run to true:
Get Task Status
GET/v1/tasks/:task_id
Retrieves the status and output of a task.
Request
Path Parameters
The unique task ID returned from the create task endpoint
Response (Processing)
Response (Completed)
Available Models
Video Generation
| Model | Description |
|---|---|
google/veo-3.1-fast | Google Veo 3.1 Fast |
google/veo-3.1-quality | Google Veo 3.1 Quality |
google/veo-3.1-lite | Google Veo 3.1 Lite |
google/veo-3.1-lite-relaxed | Google Veo 3.1 Lite Relaxed |
google/veo-3.1-extend | Google Veo 3.1 Extend |
google/veo-3.1-upscale | Google Veo 3.1 Upscale |
google/gemini-omni-flash-video | Google Gemini Omni Flash Video |
google/gemini-omni-flash-video-edit | Google Gemini Omni Flash Video Edit |
hailuo/minimax-2.0 | Minimax Hailuo 2.0 |
hailuo/minimax-2.3 | Minimax Hailuo 2.3 |
hailuo/minimax-2.3-fast | Minimax Hailuo 2.3 Fast |
kuaishou/kling-3.0-omni-video | Kling v3.0 Omni Video |
kuaishou/kling-3.0-omni-video-edit | Kling v3.0 Omni Video Edit |
kuaishou/kling-o1-video | Kling O1 Video |
kuaishou/kling-o1-video-edit | Kling O1 Video Edit |
kuaishou/kling-3.0-video | Kling v3.0 Video |
kuaishou/kling-3.0-turbo-video | Kling v3.0 Turbo Video |
kuaishou/kling-2.6-video | Kling v2.6 Video |
kuaishou/kling-2.5-turbo-video | Kling v2.5 Turbo Video |
kuaishou/kling-2.1-video | Kling v2.1 Video |
kuaishou/kling-2.1-master-video | Kling v2.1 Master Video |
kuaishou/kling-2.6-motion-control | Kling v2.6 Motion Control |
kuaishou/kling-3.0-motion-control | Kling v3.0 Motion Control |
xai/grok-imagine-video-extend | Grok Imagine Video Extend |
topaz-labs/video-upscale | Topaz Video Upscale |
Image Generation
| Model | Description |
|---|---|
google/nano-banana | Nano Banana |
google/nano-banana-pro | Nano Banana Pro |
openai/gpt-image-2 | GPT Image 2 (1K/2K/4K, multiple aspect ratios) |
black-forest-labs/flux.2-pro | Flux.2 Pro |
black-forest-labs/flux.2-flex | Flux.2 Flex |
black-forest-labs/flux.2-max | Flux.2 Max |
kuaishou/kling-o1-image | Kling O1 Image |
kuaishou/kling-3.0-omni-image | Kling v3.0 Omni Image |
kuaishou/kling-3.0-image | Kling v3.0 Image |
kuaishou/kling-2.1-image | Kling v2.1 Image |
topaz-labs/image-upscale | Topaz Image Upscale |
topaz-labs/image-generative | Topaz Image Generative |
alibaba/qwen-image-2.0-pro | Qwen Image 2.0 Pro (T2I + editing) |
alibaba/qwen-image-2.0 | Qwen Image 2.0 (T2I + editing) |
alibaba/qwen-image-max | Qwen Image Max (T2I + editing) |
alibaba/qwen-image-plus | Qwen Image Plus (T2I + editing) |
alibaba/qwen-image | Qwen Image (T2I + editing) |
alibaba/z-image-turbo | Z-Image Turbo (T2I only) |
alibaba/wan-2.7-pro-image | Wan 2.7 Pro Image (T2I + editing, up to 4K) |
alibaba/wan-2.7-image | Wan 2.7 Image (T2I + editing) |
alibaba/wan-2.6-image | Wan 2.6 Image (T2I + editing) |
alibaba/wan-2.5-image | Wan 2.5 Image (T2I + editing) |
alibaba/wan-2.2-image | Wan 2.2 Image (T2I only) |
alibaba/wan-2.2-flash-image | Wan 2.2 Flash Image (T2I only) |
xai/grok-imagine-image | Grok Imagine Image (T2I + editing) |
Audio Generation
| Model | Description |
|---|---|
suno-ai/music | Suno Music Generation |
suno-ai/add-vocals | Add Vocals to Track |
suno-ai/add-instrumental | Add Instrumental |
suno-ai/extend | Extend Audio |
suno-ai/cover | Create Cover |
suno-ai/stems | Extract Stems |
suno-ai/stems-all | Extract All Stems |
suno-ai/lyrics | Generate Lyrics |
suno-ai/wav | WAV Export |
elevenlabs/text-to-speech | ElevenLabs Text-to-Speech |
elevenlabs/text-to-dialogue | ElevenLabs Multi-Voice Dialogue |
elevenlabs/sound-effect | ElevenLabs Sound Effects |
elevenlabs/voice-isolation | ElevenLabs Voice Isolation |
elevenlabs/speech-to-text | ElevenLabs Speech-to-Text |
Model Parameters
Google Veo 3.1
Generate
Models:google/veo-3.1-fast, google/veo-3.1-quality, google/veo-3.1-lite, google/veo-3.1-lite-relaxed
Veo supports text-to-video, first-frame, first-and-last-frame, and reference-to-video workflows. Frame mode and reference mode are mutually exclusive.
| Mode | Fields | Availability |
|---|---|---|
| Frame mode | start_image_url [+ end_image_url] | All models |
| Reference mode | reference_image_urls, reference_characters [+ voice] | Image references are available on all modes. Character references are available on Fast, Lite, and Lite Relaxed only. |
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt for video generation. Use @ImageN or @CharacterN to point at specific references. |
aspect_ratio | string | No | "16:9" (default) or "9:16" |
duration | integer | No | 4, 6, or 8 seconds. Default 4. Must be 8 when any image or character reference is set. |
seed | integer | No | Reproducibility seed |
start_image_url | string | No | Public image URL used as the first frame. Cannot be combined with reference_image_urls or reference_characters. |
end_image_url | string | No | Public image URL used as the final frame. Requires start_image_url; cannot be used by itself. |
reference_image_urls | string[] | No | Image references for reference-to-video. Max 3 total expanded image URLs across reference_image_urls and character images. Cannot be combined with start/end frame fields. |
reference_characters | array | No | Character references. Max 3 total expanded image URLs across images and character image_urls. Not available on Quality. |
voice | string | No | Voice preset ID. Requires at least 1 image or character reference. See voices endpoint. |
image_urls, plus optional name and description. image_url and plain string character entries are not supported.
Rejected combinations: end_image_url without start_image_url; frame fields with reference fields; reference_characters on google/veo-3.1-quality; any image or character reference with duration other than 8; more than 3 total expanded image URLs; empty character image_urls; character image_url; plain string character entries.
Extend
Model:google/veo-3.1-extend
Extend a previously generated video. Aspect ratio is inherited from the source task.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt for the extended content |
task_id | string | Yes | Task ID of a completed generation |
model | string | Yes | One of: lite, fast, quality, lite-relaxed |
duration | integer | No | Must be 8 (only supported value for extend). Default 8. |
seed | integer | No | Reproducibility seed |
Upscale
Model:google/veo-3.1-upscale
Upscale a completed video to a higher resolution.
| Parameter | Type | Required | Description |
|---|---|---|---|
task_id | string | Yes | Task ID of a completed generation |
resolution | string | Yes | "1080p" or "4k" |
Google Gemini Omni Flash Video
Model:google/gemini-omni-flash-video
Generate 4, 6, 8, or 10 second clips in text-to-video, start-frame, or reference-to-video mode.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Main video prompt. Use @ImageN or @CharacterN to point at specific references. |
seed | integer | No | Reproducibility seed. If omitted, one is generated automatically. |
aspect_ratio | string | No | "16:9" (default) or "9:16" |
duration | integer | No | 4, 6, 8, or 10 seconds. Default 4. |
start_image_url | string | No | Public image URL used as the first frame. Cannot be combined with reference_image_urls or reference_characters. |
reference_image_urls | string[] | No | Public image reference URLs. Max 7 total expanded image + character image URLs. Cannot be combined with start/end frame fields. |
reference_characters | array | No | Character references. Max 3 character items; expanded image URLs count toward the total 7 reference limit. |
voice | string | No | Request-level voice preset ID. Requires at least one image or character reference. See voices endpoint. |
start_image_url) is available — end frame support is not yet available on Google’s end, but it’s coming in an upcoming Google update.
Google Gemini Omni Flash Video Edit
Model:google/gemini-omni-flash-video-edit
Edit an existing uploaded video. Provide exactly one source video URL in reference_video_urls.
| Parameter | Type | Required | Description |
|---|---|---|---|
reference_video_urls | string[] | Yes | Public source video URL. Must contain exactly one URL. |
prompt | string | Yes | Edit instruction. Use @Video1 to refer to the source video and @ImageN/@CharacterN for extra references. |
reference_image_urls | string[] | No | Public image URLs used as edit references. |
reference_characters | array | No | Character references for the edit. Max 3 character items. Supports per-character voice or custom_voice. See voices endpoint. |
seed | integer | No | Reproducibility seed. If omitted, one is generated automatically. |
start_frame | integer | No | First source frame index included in the edit range. Default 0. |
end_frame | integer | No | Last source frame index included in the edit range. Defaults to the detected final frame when available. |
task_id; missing, empty, or multiple reference_video_urls; end_frame lower than start_frame; more than 7 total references; empty character image_urls; character image_url; plain string character entries.
Minimax Hailuo
Models:hailuo/minimax-2.0, hailuo/minimax-2.3, hailuo/minimax-2.3-fast
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes* | Max 2000 chars. *Required if no start_image_url |
start_image_url | string | Yes* | Image URL (auto-uploaded). *Required if no prompt (required for 2.3-fast) |
end_image_url | string | No | End frame image URL (minimax-2.0 only, 768p/1080p) |
duration | integer | No | 6 or 10 seconds. 1080p only supports 6 |
resolution | string | No | "768p" (default), "1080p" |
prompt_optimization | boolean | No | Let MiniMax optimize prompt |
Kling v3.0 Omni Video
Model:kuaishou/kling-3.0-omni-video
| Parameter | Type | Required | Description |
|---|---|---|---|
video_mode | string | No | "elements" (default), "start_end_frame", "transform", "video_reference" |
prompt | string | Conditional | Text prompt. Mutually exclusive with multi_shots |
mode | string | No | "pro" (default). "std" (720p) or "pro" (1080p) |
duration | integer | No | 3–15 seconds (default 5) |
aspect_ratio | string | No | "16:9" (default), "9:16", "1:1", "auto" (start_end_frame only) |
native_audio | boolean | No | Generate AI audio (default false) |
keep_audio | boolean | No | Preserve audio from source video (default true) |
image_urls | string[] | No | Up to 7 reference image URLs. Use @Image1, @Image2 in prompt |
start_frame_url | string | No | First frame image URL (start_end_frame mode) |
end_frame_url | string | No | Last frame image URL (start_end_frame mode) |
video_url | string | No | Source video URL (transform/video_reference modes) |
multi_shots | array | No | 2–6 shots, each { "prompt": string, "duration": int }. Mutually exclusive with prompt |
elements | array | No | Character/object elements (IMAGE + VIDEO) |
Kling O1 Video
Model:kuaishou/kling-o1-video
Same parameters as Omni 3.0 but does not support multi_shots or native_audio. Max duration 10s.
| Parameter | Type | Required | Description |
|---|---|---|---|
video_mode | string | No | "elements" (default), "start_end_frame", "transform", "video_reference" |
prompt | string | Yes | Text prompt |
mode | string | No | "pro" (default). "std" (720p) or "pro" (1080p) |
duration | integer | No | 3–10 seconds (default 5) |
aspect_ratio | string | No | "16:9" (default), "9:16", "1:1", "auto" (start_end_frame only) |
keep_audio | boolean | No | Preserve audio from source video (default true) |
image_urls | string[] | No | Up to 7 reference image URLs. Use @Image1, @Image2 in prompt |
start_frame_url | string | No | First frame image URL (start_end_frame mode) |
end_frame_url | string | No | Last frame image URL (start_end_frame mode) |
video_url | string | No | Source video URL (transform/video_reference modes) |
Kling v3.0 Omni Video Edit
Model:kuaishou/kling-3.0-omni-video-edit
| Parameter | Type | Required | Description |
|---|---|---|---|
video_url | string | Yes | Source video URL to edit |
prompt | string | Yes | Text prompt describing the edit |
video_mode | string | No | "reference" (default) or "transform" |
keep_audio | boolean | No | Preserve original audio (default false) |
mode | string | No | "std" (default) or "pro" |
aspect_ratio | string | No | "16:9" (default), "9:16", "1:1" |
image_urls | string[] | No | Up to 4 reference image URLs. Use @Image1, @Image2 in prompt |
elements | array | No | Up to 4 character/object elements |
Kling O1 Video Edit
Model:kuaishou/kling-o1-video-edit
Same parameters as Omni 3.0 video edit but does not support elements.
| Parameter | Type | Required | Description |
|---|---|---|---|
video_url | string | Yes | Source video URL to edit |
prompt | string | Yes | Text prompt describing the edit |
video_mode | string | No | "reference" (default) or "transform" |
keep_audio | boolean | No | Preserve original audio (default false) |
mode | string | No | "std" (default) or "pro" |
aspect_ratio | string | No | "16:9" (default), "9:16", "1:1" |
image_urls | string[] | No | Up to 4 reference image URLs. Use @Image1, @Image2 in prompt |
Kling v3.0 Video
Model:kuaishou/kling-3.0-video
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Conditional | Text prompt. Mutually exclusive with multi_shots |
mode | string | No | "pro" (default). "std" (720p) or "pro" (1080p) |
duration | integer | No | 3–15 seconds (default 5) |
aspect_ratio | string | No | "16:9" (default), "9:16", "1:1" |
native_audio | boolean | No | Generate AI audio (default true) |
start_frame_url | string | Yes | First frame image URL |
end_frame_url | string | No | Last frame image URL |
elements | array | No | Character/object elements |
multi_shots | array | No | 2–6 shots, each { "prompt": string, "duration": int }. Mutually exclusive with prompt |
Kling v3.0 Turbo Video
Model:kuaishou/kling-3.0-turbo-video
Faster variant of Kling v3.0. Text-to-video or optional start-frame image-to-video only. No native audio, multi-shot, end frame, or 4K.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt |
mode | string | No | "pro" (default). "std" (720p) or "pro" (1080p) |
duration | integer | No | 3–15 seconds (default 5) |
aspect_ratio | string | No | "16:9" (default), "9:16", "1:1" |
start_frame_url | string | No | First frame image URL. Omit for text-to-video |
Kling v2.6 Video
Model:kuaishou/kling-2.6-video
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt |
mode | string | No | "pro" (default). "std" (720p) or "pro" (1080p) |
duration | integer | No | 5 or 10 seconds |
native_audio | boolean | No | Enable AI audio generation (default false). Requires pro mode |
start_frame_url | string | Yes | First frame image URL |
end_frame_url | string | No | Last frame image URL (not available with native_audio) |
voices | array | No | Voice references (max 5, requires native_audio). Each: { "voice_id": int } or { "voice_url": string } |
Kling v2.5 Turbo Video
Model:kuaishou/kling-2.5-turbo-video
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt |
mode | string | No | "pro" (default). "std" (720p) or "pro" (1080p) |
duration | integer | No | 5 or 10 seconds |
aspect_ratio | string | No | "16:9" (default), "9:16", "1:1". Ignored when start_frame_url is set |
start_frame_url | string | No | First frame image URL |
end_frame_url | string | No | Last frame image URL |
sound_effects | object | No | { "sound": string, "music": string, "asmr_mode": boolean }. Omit to disable audio |
Kling v2.1 Video
Model:kuaishou/kling-2.1-video
Image-to-video only.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt |
start_frame_url | string | Yes | First frame image URL |
end_frame_url | string | No | Last frame image URL |
duration | integer | No | 5 or 10 seconds |
mode | string | No | "pro" (default). "std" or "pro" |
sound_effects | object | No | { "sound": string, "music": string, "asmr_mode": boolean }. Omit to disable audio |
Kling v2.1 Master Video
Model:kuaishou/kling-2.1-master-video
Pro-only. No end frame support.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt |
duration | integer | No | 5 or 10 seconds |
start_frame_url | string | No | First frame image URL (optional) |
sound_effects | object | No | { "sound": string, "music": string, "asmr_mode": boolean }. Omit to disable audio |
Kling v3.0 Motion Control
Model:kuaishou/kling-3.0-motion-control
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt describing the motion |
image_url | string | Yes | Character/subject image URL |
video_url | string | Yes | Motion reference video URL |
mode | string | No | "std" (default) or "pro" |
keep_audio | boolean | No | Preserve audio from motion video (default true) |
character_orientation | string | No | "video" (default) or "image" |
elements | array | No | Additional character/object elements |
Kling v2.6 Motion Control
Model:kuaishou/kling-2.6-motion-control
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt describing the motion |
image_url | string | Yes | Character/subject image URL |
video_url | string | Yes | Motion reference video URL |
mode | string | No | "std" (default) or "pro" |
keep_audio | boolean | No | Preserve audio from motion video (default true) |
character_orientation | string | No | "video" (default) or "image" |
Grok Imagine Video Extend
Model:xai/grok-imagine-video-extend
Extend a previously generated video via HTTP streaming. Two mutually exclusive modes:
| Mode | How to activate | Behaviour |
|---|---|---|
| Preset | Provide video_preset | The preset controls the video style; prompt, extend_at, extend_duration are ignored |
| Custom | Omit video_preset | You control timing and prompt; prompt, extend_at, extend_duration are required |
| Parameter | Type | Required | Description |
|---|---|---|---|
task_id | string | Yes | Task ID of a completed video generation |
video_preset | string | No | "spicy" or "normal". Enables preset mode |
prompt | string | No | Text prompt to guide the extension. Required in custom mode |
extend_at | float | No | Second to start the extension from. Required in custom mode |
extend_duration | int | No | 6 or 10 seconds. Required in custom mode |
GPT Image
Models:openai/gpt-image-2
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description |
image_urls | array | No | Reference image URLs for image editing mode |
aspect_ratio | string | No | 1:1, 3:2, 2:3, 16:9. Default: 1:1 |
resolution | string | No | 1K, 2K, 4K. Default: 1K |
Nano Banana
Models:google/nano-banana, google/nano-banana-pro
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description |
aspect_ratio | string | Yes | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 |
image_urls | array | No | Reference images |
resolution | string | No | Pro only: 1k, 2k, 4k |
Flux.2
Models:black-forest-labs/flux.2-pro, black-forest-labs/flux.2-flex, black-forest-labs/flux.2-max
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description |
image_urls | array | No | Reference images (Pro/Max: 8, Flex: 10) |
aspect_ratio | string | No | auto, 1:1, 4:3, 16:9, 3:2, 2:3, 9:16, 3:4 (Max also: 5:4, 21:9) |
quality | string | No | 1K or 2K |
steps | integer | No | Flex only: 1-50 (more = higher quality) |
cfg | number | No | Flex only: 1.5-10 (higher = follows prompt more strictly) |
Qwen Image 2.0 Pro
Model:alibaba/qwen-image-2.0-pro — $0.0525/image
Best quality. Text rendering, realistic textures. Automatically switches between T2I and editing based on whether image_urls is provided.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt (max 800 chars) |
aspect_ratio | string | No | 1:1 (default), 16:9, 9:16, 4:3, 3:4 |
image_urls | string[] | No | Omit for T2I. Provide image URLs for editing |
negative_prompt | string | No | What to avoid (max 500 chars) |
prompt_extend | boolean | No | Smart prompt rewriting (default true) |
seed | integer | No | Seed for reproducibility |
Qwen Image 2.0
Model:alibaba/qwen-image-2.0 — $0.0245/image
Faster version of 2.0 Pro. Same capabilities and parameters.
Qwen Image Max
Model:alibaba/qwen-image-max — T2I 0.0525/image
Highest realism, fewest AI artifacts. Editing uses a specialized edit model under the hood (industrial design, geometric reasoning, character consistency). Same parameters as Qwen Image 2.0 Pro.
Qwen Image Plus
Model:alibaba/qwen-image-plus — T2I 0.021/image
Diverse artistic styles, fast. Editing uses a specialized edit model under the hood. Same parameters as Qwen Image 2.0 Pro.
Qwen Image
Model:alibaba/qwen-image — T2I 0.0315/image
Older base model. Editing uses a specialized edit model under the hood. Same parameters as Qwen Image 2.0 Pro.
Z-Image Turbo
Model:alibaba/z-image-turbo — **0.021 with prompt rewriting)
Lightweight fast T2I only. Chinese and English text rendering.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt (max 800 chars) |
aspect_ratio | string | No | 1:1 (default), 2:3, 3:2, 3:4, 4:3, 9:16, 16:9 |
prompt_extend | boolean | No | Prompt rewriting (default false, doubles cost) |
seed | integer | No | Seed for reproducibility |
Wan 2.7 Pro Image
Model:alibaba/wan-2.7-pro-image — $0.0525/image
Highest quality. Thinking mode for T2I. Supports editing with up to 9 images. Up to 4K resolution for T2I.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt (max 5000 chars) |
aspect_ratio | string | No | 1:1 (default), 16:9, 9:16, 4:3, 3:4, 3:2, 2:3. Editing preserves input ratio |
image_urls | string[] | No | Omit for T2I. Up to 9 images for editing |
thinking_mode | boolean | No | Better quality, slower (default true). T2I only |
seed | integer | No | Seed for reproducibility |
Wan 2.7 Image
Model:alibaba/wan-2.7-image — $0.021/image
Faster variant of 2.7 Pro. Same capabilities, max 2K resolution. Same parameters as Wan 2.7 Pro Image.
Wan 2.6 Image
Model:alibaba/wan-2.6-image — $0.021/image
Automatically selects T2I or editing mode based on image_urls. Supports style transfer with 1–4 reference images.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt (max 2000 chars) |
aspect_ratio | string | No | 1:1 (default), 2:3, 3:2, 3:4, 4:3, 9:16, 16:9 |
image_urls | string[] | No | Omit for T2I. 1–4 images for editing/style transfer |
negative_prompt | string | No | What to avoid (max 500 chars) |
prompt_extend | boolean | No | Smart prompt rewriting (default true) |
seed | integer | No | Seed for reproducibility |
Wan 2.5 Image
Model:alibaba/wan-2.5-image — $0.021/image
Automatically selects T2I or editing mode based on image_urls. Supports 1–3 reference images. Same parameters as Wan 2.6 Image.
Wan 2.2 Image
Model:alibaba/wan-2.2-image — $0.035/image
T2I only. Does not accept image_urls.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text prompt (max 500 chars) |
aspect_ratio | string | No | 1:1 (default), 3:4, 4:3, 9:16, 16:9 |
negative_prompt | string | No | What to avoid |
seed | integer | No | Seed for reproducibility |
Wan 2.2 Flash Image
Model:alibaba/wan-2.2-flash-image — $0.0175/image
Fast T2I only. Cheapest Wan image model. Same parameters as Wan 2.2 Image.
Grok Imagine Image
Model:xai/grok-imagine-image — Pro mode: $0.025/image
Generate and edit images using xAI’s Grok Imagine model. When image_urls is provided, the model runs in edit mode.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description or edit instruction |
aspect_ratio | string | No | "1:1" (default), "2:3", "3:2", "9:16", "16:9" |
image_urls | string[] | No | 1–5 reference image URLs (triggers edit mode) |
enable_pro | boolean | No | Enable pro mode for higher quality results |
upsample_prompt | boolean | No | Let AI enhance your prompt for better results |
enable_nsfw | boolean | No | Enable NSFW content generation |
Suno Music
Model:suno-ai/music
| Parameter | Type | Required | Description |
|---|---|---|---|
mv | string | Yes | Model version: chirp-v3-5, chirp-v4, chirp-auk, chirp-bluejay, chirp-crow |
custom | boolean | Yes | false for simple mode, true for custom mode |
gpt_description_prompt | string | No | Simple mode: song description with lyrics |
prompt | string | No | Custom mode: detailed lyrics/prompt |
tags | string | No | Custom mode: genre/style tags |
title | string | No | Song title |
make_instrumental | boolean | No | Generate instrumental only |
negative_tags | string | No | Custom mode: styles to avoid |
persona_id | string | No | Custom voice ID from Suno voice creation; music uses that voice for vocals |
Suno Audio Operations
Models:suno-ai/add-vocals, suno-ai/add-instrumental, suno-ai/extend, suno-ai/cover
| Parameter | Type | Required | Description |
|---|---|---|---|
mv | string | Yes | Model version |
clip_id | string | Yes* | Existing clip ID |
audio_url | string | Yes* | Audio file URL (alternative to clip_id) |
custom | boolean | Yes | Simple or custom mode |
gpt_description_prompt | string | No | Simple mode description |
prompt | string | No | Custom mode prompt |
continue_at | number | No | Extend: time in seconds to continue from |
start_s | number | No | Start time for overlay |
end_s | number | No | End time for overlay |
Suno Stems
Models:suno-ai/stems, suno-ai/stems-all
| Parameter | Type | Required | Description |
|---|---|---|---|
clip_id | string | Yes | Clip ID to extract stems from |
title | string | No | Title for extraction |
Suno Lyrics
Model:suno-ai/lyrics
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Description of lyrics to generate |
mv | string | Yes | Lyrics model: remi-v1 or default |
Rate Limit Error Response
Error Responses
| Code | Description |
|---|---|
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid or missing API key |
| 402 | Payment Required - Insufficient balance |
| 404 | Not Found - Task or model not found |
| 429 | Too Many Requests - Rate limited |
| 500 | Internal Server Error |
