A tiny, agent-native CLI for generating images, video and text with dead-simple commands, stdin support and predictable artifact outputs. Uses Vercel AI SDK and AI Gateway for unified access to hundreds of models.
npm install -g ai-cliRequires Node.js 20+ and an AI Gateway API key or a provider-specific key (e.g. OPENAI_API_KEY).
ai image "a cute dog"
ai video "a spinning triangle"
ai text "explain quantum computing"
ai models # list available modelsai image "a dragon" | ai video "animate this"
cat notes.txt | ai text "summarize this"
git diff | ai text "explain these changes"All commands support:
-m, --model <id> Model ID (creator/model-name), comma-separated for multi-model
-o, --output <path> Output file path or directory
-n, --count <n> Number of generations per model (default: 1)
-p, --concurrency <n> Max parallel generations (default: 4, video: 2)
-q, --quiet Suppress progress output
--json Output metadata as JSON
Model IDs can be specified as creator/model-name or just model-name (resolved against models fetched from the gateway):
ai text -m gpt-5.5 "hello" # resolves to openai/gpt-5.5
ai image -m flux-2-pro "a sunset" # resolves to bfl/flux-2-pro--size <WxH> Image size (e.g. 1024x1024)
--aspect-ratio <W:H> Aspect ratio (e.g. 16:9)
--quality <level> Quality (standard, hd)
--style <style> Style (vivid, natural)
--no-preview Disable inline image preview
--aspect-ratio <W:H> Aspect ratio (e.g. 16:9)
--duration <seconds> Duration in seconds
--no-preview Disable inline video frame preview
-f, --format <fmt> Output format: md, txt (default: md)
-s, --system <prompt> System prompt
--max-tokens <n> Maximum tokens to generate
-t, --temperature <n> Temperature (0-2)
--type <type> Filter by type: text, image, video
--creator <name> Filter by creator (e.g. openai, google)
--json Output as JSON (includes descriptions)
All model types (text, image, video) are fetched live from the AI Gateway.
Generate with multiple models by comma-separating -m:
ai image "a sunset" -m "openai/gpt-image-1,xai/grok-imagine-image,bfl/flux-2-pro"Combine with -n to generate multiple per model:
ai image "a sunset" -n 2 -m "openai/gpt-image-1,bfl/flux-2-pro" # 4 images totalWhen running in a terminal that supports the Kitty graphics protocol (Kitty, Ghostty, WezTerm, Warp, iTerm2), generated images and videos are displayed inline automatically. Video previews decode an H.264 keyframe from the midpoint of the video using openh264 compiled to WebAssembly — no native dependencies required. Use --no-preview to disable this, or set AI_CLI_PREVIEW=1 to force it on in undetected terminals.
- text: saves to
output.md(interactive), stdout when piped - image/video: saves to file (interactive), raw binary stdout when piped
-o <dir>: saves inside the directory with auto-generated names
| Variable | Description |
|---|---|
AI_GATEWAY_API_KEY |
AI Gateway authentication key |
OPENAI_API_KEY |
Provider-specific key (or other provider keys) |
AI_CLI_TEXT_MODEL |
Default text model (overrides openai/gpt-5.5) |
AI_CLI_IMAGE_MODEL |
Default image model (overrides openai/gpt-image-2) |
AI_CLI_VIDEO_MODEL |
Default video model (overrides bytedance/seedance-2.0) |
AI_CLI_OUTPUT_DIR |
Default output directory for generated files |
AI_CLI_PREVIEW |
Set to 1 to force inline image preview, 0 to disable |
NO_COLOR |
Disable ANSI color output |
FORCE_COLOR |
Force color output even when not a TTY |
The -m flag always takes priority over AI_CLI_*_MODEL env vars. The -o flag always takes priority over AI_CLI_OUTPUT_DIR.
Requests that exceed the timeout are aborted automatically:
| Command | Timeout |
|---|---|
text |
120 seconds |
image |
120 seconds |
video |
300 seconds |
| Code | Meaning |
|---|---|
0 |
Success |
1 |
All generations failed |
2 |
Partial failure (some succeeded, some failed) |