Skip to content

vra/bonsai-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bonsai-cli

One-line text-to-image on Apple Silicon, powered by Bonsai-Image 4B (ternary 1.58-bit, MLX). Pure Python single file, runs through uv with no manual setup.

English · 中文 · AGENTS.md · skills/

Example output: a tiny bonsai on a ceramic plate (1024×1024, generated by this tool in ~165s on M-series)

# Run directly from GitHub — no clone, no download, no install
uv run https://raw.githubusercontent.com/vra/bonsai-cli/main/bonsai.py "an icy bonsai tree in a rainy forest with snowy mountains, photorealistic"

Or, if you already cloned the repo:

uv run bonsai.py "an icy bonsai tree in a rainy forest with snowy mountains, photorealistic"

That's it. First run takes ~15 min (env + 3.9 GB model download). After that each image takes seconds to a couple of minutes depending on size.


Why this exists

Bonsai-Image 4B by Prism ML is a 1.58-bit text-to-image diffusion transformer optimized for Apple Silicon — 1.21 GB transformer, ~3.9 GB total payload, 4 sampling steps. It's tiny, fast, and runs entirely on your Mac.

The official way to run it is to clone Bonsai-Image-Demo, run setup.sh, download the model, then run generate.sh. That works, but on a typical macOS install setup.sh will refuse to proceed because it strictly requires the full Xcode app + Metal Toolchain to be present — even though that's only needed for users compiling MLX from source, not for running the prebuilt MLX wheel.

bonsai-cli is a thin Python wrapper that:

  1. Verifies you're on Apple Silicon
  2. Clones the official demo repo to ~/.bonsai-image/
  3. Bypasses the Metal Toolchain check and installs everything via uv directly
  4. Patches vendor/image-studio/pyproject.toml so MLX comes from the PyPI wheel instead of a git source (which would force a source build that does need the Toolchain)
  5. Downloads the model
  6. Calls the official scripts/generate.sh with your prompt
  7. Saves the PNG to ~/Pictures/bonsai/ and opens it in Preview

Everything heavy (MLX, mflux, diffusion code) is upstream code — this repo is ~250 lines of glue.

Requirements

  • macOS 13.5+ on Apple Silicon (M1 / M2 / M3 / M4). Intel Macs are not supported by MLX.
  • Xcode Command Line Tools (xcode-select --install) — not the full Xcode app.
  • uvbrew install uv or curl -LsSf https://astral.sh/uv/install.sh | sh
  • Free disk: ~5 GB (model + venv + vendored repos).
  • An internet connection on first run for the HuggingFace download.

Install

No install — it's a single self-contained script with PEP 723 inline dependencies. Pick one:

# A. Run straight from GitHub (no files on disk except ~/.bonsai-image/)
uv run https://raw.githubusercontent.com/vra/bonsai-cli/main/bonsai.py "your prompt"

# B. Clone the repo
git clone https://github.com/vra/bonsai-cli.git
cd bonsai-cli

# C. Or just download the one file
curl -O https://raw.githubusercontent.com/vra/bonsai-cli/main/bonsai.py
chmod +x bonsai.py

Optional, install as a global command:

mkdir -p ~/.local/bin && ln -sf "$PWD/bonsai.py" ~/.local/bin/bonsai
# then anywhere:
bonsai "your prompt"

Usage

# Zero-install: run directly from GitHub
uv run https://raw.githubusercontent.com/vra/bonsai-cli/main/bonsai.py "a tiny bonsai on a ceramic plate, soft morning light"

# Simplest, if you have the file locally
uv run bonsai.py "a tiny bonsai on a ceramic plate, soft morning light"

# Custom size (width and height must both be multiples of 32)
uv run bonsai.py "..." --size 1248x832

# Reproducible
uv run bonsai.py "..." --seed 9909

# Smaller / faster 1-bit variant
uv run bonsai.py "..." --variant binary

# Save somewhere specific, skip auto-open
uv run bonsai.py "..." -o ~/Desktop/out.png --no-open

# Full help
uv run bonsai.py --help

Flags

Flag Default Notes
prompt (positional) English works best (text encoder is Qwen3-4B).
--size WxH 1024x1024 Both dimensions must be multiples of 32.
--seed N random Reproducible runs.
--steps N 4 FlowMatchEuler default — rarely worth changing.
--variant ternary|binary ternary ternary = 1.58-bit, better quality. binary = 1-bit, smaller / faster.
-o, --output PATH ~/Pictures/bonsai/<ts>_<slug>.png Auto-creates the parent directory.
--no-open off Don't auto-open the result in Preview.

What happens under the hood

~/.bonsai-image/
└── Bonsai-Image-Demo/        ← git clone --depth 1 of PrismML-Eng/Bonsai-Image-Demo
    ├── .venv/                ← uv venv --python 3.11
    ├── vendor/
    │   ├── image-studio/     ← git clone of PrismML-Eng/image-studio (patched)
    │   └── mflux-prism/      ← git clone of PrismML-Eng/mflux-prism
    ├── models/
    │   └── bonsai-image-4B-ternary-mlx/   ← ~3.9 GB HuggingFace download
    └── scripts/generate.sh   ← what we ultimately call

The patch we apply to vendor/image-studio/pyproject.toml:

# before:
[tool.uv.sources]
mflux = { git = "https://github.com/PrismML-Eng/mflux-prism.git", rev = "" }
mlx   = { git = "https://github.com/PrismML-Eng/mlx.git",          rev = "" }

# after:
[tool.uv.sources]
mflux = { path = "../mflux-prism", editable = true }
# (mlx line removed — uv falls back to the PyPI wheel which ships prebuilt
#  Metal shaders, so no Xcode/Metal Toolchain needed)

This is the single change that lets the whole pipeline install on a vanilla CLT-only Mac.

Performance

Reference numbers from a recent run on the author's machine:

Phase Time
First-run env setup (one-time) ~1 min
Model download, ~3.9 GB (one-time) ~13 min
First image, 1024×1024 (cold MLX kernel compile) ~165 s
Subsequent images, same shape much faster (kernel cache warm)

If you plan to generate many images in a row, the upstream serve.sh keeps weights warm and is significantly faster. This wrapper is designed for one-shot use from the terminal.

# Long-running studio with web UI on :3000
~/.bonsai-image/Bonsai-Image-Demo/scripts/serve.sh

Troubleshooting

Error: ... metal shader compiler isn't available Should not happen — we bypass this check. If it does, your ~/.bonsai-image/Bonsai-Image-Demo/vendor/image-studio/pyproject.toml didn't get patched. Delete ~/.bonsai-image/ and rerun.

uv sync fails resolving setuptools / cmake Means MLX is being built from source instead of installed as a wheel. Confirm vendor/image-studio/pyproject.toml no longer contains mlx = { git = ... }. If it does, your patch step was skipped — delete ~/.bonsai-image/Bonsai-Image-Demo/vendor/image-studio/ and rerun.

HuggingFace download is slow or stalls hf-transfer is enabled by default and usually saturates the link. If it stalls, set BONSAI_DISABLE_HF_TRANSFER=1 and rerun — falls back to the slower requests backend.

size must be a multiple of 32 Both width and height. Use e.g. 1024x1024, 1248x832, 512x768, etc.

Want a clean reset

rm -rf ~/.bonsai-image

Out of disk space The model is ~3.9 GB and the .venv is ~2 GB. Free up at least 6 GB before running.

For AI agents

This repo ships first-class instructions for autonomous AI coding agents to integrate bonsai-cli as a tool:

  • AGENTS.md — instructions in the agents.md convention: when to invoke, canonical CLI surface, output parsing, failure-pattern table, environment variables, and four ready-made integration recipes (Python subprocess, LangChain @tool, Claude Code skill, Bash).
  • skills/bonsai-generate/ — a Claude Code skill with full preflight checks, prompt-engineering decision tree, size lookup table, latency expectations, and an error-symptom → fix table.

To wire the skill into Claude Code, drop it into either location:

# Project-local (shared with collaborators via this repo):
ln -s "$PWD/skills/bonsai-generate" .claude/skills/bonsai-generate

# Or user-global:
cp -r skills/bonsai-generate ~/.claude/skills/

Once present, Claude Code auto-discovers it — users can say things like "generate an image of a tiny bonsai on a ceramic plate" and the skill takes over.

For other agent frameworks (LangChain, AutoGen, Cursor, Aider, etc.), see the recipes in AGENTS.md.

Project structure

bonsai-cli/
├── bonsai.py                       # The whole tool (PEP 723 inline deps)
├── README.md                       # English (this file)
├── README.zh-CN.md                 # 中文
├── AGENTS.md                       # Instructions for AI coding agents
├── skills/
│   └── bonsai-generate/
│       └── SKILL.md                # Claude Code skill (drop into .claude/skills/)
├── assets/
│   └── example.png                 # Sample output for the README
├── LICENSE                         # Apache 2.0 (matches upstream)
└── .gitignore

Credits

This tool is a wrapper. All credit for the actual model and inference pipeline goes to the upstream projects:

If this wrapper saves you time, please star the upstream repos first.

License

Apache 2.0, matching the upstream Bonsai-Image-Demo. See LICENSE.

This wrapper is community-maintained and not affiliated with Prism ML or Apple.

About

One-line text-to-image on Apple Silicon, powered by Bonsai-Image 4B. Single-file uv script with first-class AI agent integration

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages