FastAPI backend for text-to-image generation powered by FLUX.2 [klein] 4B.
- 🎨 Text-to-image generation with FLUX.2 [klein] 4B
- ⚡ Sub-2-second inference (4 steps, step-distilled)
- 🌐 Built-in web demo at
/ - 📊 Request queue with async processing
- 🔧 Configurable via environment variables
git clone https://github.com/jamarju/flux2-api-v3.git
cd flux2-api-v3
uv run flux2-apiOpen http://localhost:8000 for the web demo.
Generate an image from a text prompt.
Request:
{
"prompt": "A cat holding a sign that says hello world",
"seed": 42,
"width": 1024,
"height": 1024,
"num_inference_steps": 4
}All parameters except prompt are optional.
Response: PNG binary image
Response Headers:
| Header | Description |
|---|---|
X-Seed |
Seed used for generation |
X-Generation-Time-Ms |
Inference time in milliseconds |
X-Queue-Wait-Ms |
Queue wait time in milliseconds |
Example with curl:
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "A sunset over the ocean"}' \
-o output.pngHealth check endpoint.
Response:
{"status": "ok", "model_loaded": true}Environment variables (prefix FLUX2_):
| Variable | Default | Description |
|---|---|---|
FLUX2_HOST |
0.0.0.0 |
Server bind address |
FLUX2_PORT |
8000 |
Server port |
FLUX2_DEVICE |
cuda |
PyTorch device |
FLUX2_MODEL_ID |
black-forest-labs/FLUX.2-klein-4B |
HuggingFace model ID |
FLUX2_DEFAULT_STEPS |
4 |
Default inference steps |
FLUX2_DEFAULT_WIDTH |
1024 |
Default image width |
FLUX2_DEFAULT_HEIGHT |
1024 |
Default image height |
FLUX2_GUIDANCE_SCALE |
1.0 |
Guidance scale |
FLUX2_DTYPE |
bfloat16 |
Model dtype (bfloat16, float16, float32) |
- Python ≥ 3.12
- NVIDIA GPU with ≥ 13 GB VRAM (RTX 3090/4070+)
- uv package manager
- CUDA toolkit
See docs/cloudflare-tunnel.md for instructions on exposing the service to the internet via Cloudflare Tunnel.
# Unit tests (requires GPU)
uv run pytest
# Smoke test (starts a real server)
uv run pytest -m smokeFor the full report, see benchmarks/results/REPORT.md.