December 5, 2025 (March 9, 2026)

Table of contents

  1. Available Video Models
  2. Quality & Duration Matrix
  3. Audio Features by Model
  4. v5.5/v5.6 Native Audio
  5. Feature Comparison
  6. Video Endpoint Compatibility
  7. Video Endpoint Constraints
  8. Quality Tier Requirements
  9. Available Image Models
  10. Image Aspect Ratios
  11. Image Endpoints
  12. Unlimited Image Generation (Relax Mode)

Video Models

Available Video Models

Model Description
v5.6 Latest model with audio generation
v5.5 Audio generation
v5 Lip sync TTS, sound effects
v5-fast Fast generation, no audio features

Quality & Duration Matrix

All models support duration 1-10 seconds, except:

  • 1080p is limited to 1-8 seconds maximum
  • create-frames endpoint is limited to 1-8 seconds
Quality Duration Range Notes
360p 1-10s All tiers
540p (default) 1-10s All tiers
720p 1-10s Standard+
1080p 1-8s Pro/Premium

Audio Features by Model

Model Audio Feature Usage
v5.6 Integrated audio audio: true
v5.5 Integrated audio audio: true
v5 Lip sync + Sound effects lip_sync_tts_prompt + sound_effect_prompt
v5-fast None No audio support

v5.5/v5.6 Native Audio

Models v5.5 and v5.6 feature integrated audio generation - voice, lip sync, and background music are generated together with the video in a single step.

{
  "model": "v5.6",
  "audio": true,
  "prompt": "A woman says hello and waves at the camera"
}
v5.5/v5.6 Audio v5 Audio
Use audio: true Use lip_sync_tts_prompt + sound_effect_prompt
Voice integrated with video Separate lipsync step available
Background music auto-generated Manual via sound_effect_prompt
Lipsync endpoint not supported Lipsync endpoint supported

Feature Comparison

Feature v5.6 v5.5 v5 v5-fast
Native Audio - -
Lip Sync TTS - - -
Sound Effects - - -
Duration 1-10s
Preview Mode

Video Endpoint Compatibility

Endpoint v5.6 v5.5 v5 v5-fast
POST videos/create
POST videos/create-frames
POST videos/create-transition
(2-frame)
-
POST videos/create-transition
(3+ frame)
- - -
POST videos/extend - -
POST videos/modify - - -
POST videos/upscale
POST videos/lipsync - - -
POST videos/create-fusion - - -

Video Endpoint Constraints

Endpoint Supported Models Max Duration Max Quality
create v5.6, v5.5, v5, v5-fast 10s (8s@1080p) 1080p
create-frames v5.6, v5.5, v5, v5-fast 8s 1080p
create-transition
(2-frame)
v5.6, v5.5, v5 8s 1080p
create-transition
(3+ frame)
v5 only 8s 1080p
extend v5.5, v5 10s (8s@1080p) 1080p
modify v5.5 only source video 720p
lipsync v5 only source video source
fusion v5 only 10s (8s@1080p) 1080p

Quality Tier Requirements

  • 360p: All subscription tiers
  • 540p: All subscription tiers (default)
  • 720p: Standard or higher
  • 1080p: Pro/Premium

Image Models

Available Image Models

Model Qualities Max Refs Est. Time Default
qwen-image 720p, 1080p 3 ~3s default
nano-banana 1080p 3 ~10s  
seedream-4.0 1080p, 1440p, 2160p 6 ~10s  
seedream-4.5 1440p, 2160p 6 ~15s  
nano-banana-2 512p, 1080p, 1440p, 2160p 9 ~30s  
seedream-5.0-lite 1440p, 1800p 6 ~30s  
nano-banana-pro 1080p, 1440p, 2160p 9 ~60s  

Image Aspect Ratios

All image models support: 1:1, 16:9, 9:16, 4:3, 3:4, 5:4, 4:5, 3:2, 2:3, 21:9.

All models except qwen-image also support auto (default). qwen-image defaults to 1:1.

Image Endpoints

All image models are supported by the following endpoints:

Unlimited Image Generation (Relax Mode)

Pro+ subscription plans include unlimited image generation in Relax Mode for select models with generation times ~3s to ~60s based on the model. The number of models with unlimited access increases with higher tiers:

Plan Price Unlimited Models
Pro $30/m qwen-image
Premium $60/m qwen-image + selectively others
Ultra $199/m ALL models