A CLI for running HuggingFace models, optimized for AMD ROCm.
What this is: A convenient wrapper for common AI tasks (image/video/speech generation, transcription). Not a replacement for transformers or diffusers, but a simpler interface when you just want to run a model without writing Python.
Who it's for: AMD GPU owners frustrated with CUDA-first tooling, and anyone who wants a unified CLI for multiple AI modalities.
| Document | Description |
|---|---|
| Configuration Guide | Config files, locations, and examples |
| Environment Variables | Complete environment variable reference |
| GPU Setup Guide | AMD ROCm, NVIDIA CUDA, Apple MPS setup |
| .env.example | Copy this to .env for quick setup |
- Text-to-Image: Z-Image-Turbo, Stable Diffusion XL, FLUX
- Image-to-Image: Qwen Image Edit (advanced editing with multi-image support), FLUX.2 Klein (fast multi-ref), SDXL Refiner
- Text-to-Video: LTX-2, HunyuanVideo-1.5, CogVideoX, Wan2.2
- Text-to-Speech: Bark, MMS-TTS, GLM-TTS
- Speech-to-Text: Whisper (with timestamps and SRT export)
- Plus: Text generation, classification, translation, and more via transformers pipelines
- Interactive Wizard (
-I): Full guided experience - select task, model, input, output, and all options - File Picker (
@syntax): Interactive file selection with multiple modes (@, @?, @., @~, @*.ext, @@) - Interactive Input: Guided JSON builder for complex inputs (image-to-image, etc.)
- History Tracking: View and re-run previous commands with
hftool history - Dry-Run Mode: Preview operations without executing (--dry-run)
- Configuration Files: Save preferences in TOML config files
- Shell Completions: Tab completion for bash, zsh, and fish
- Better Error Messages: Actionable suggestions when things go wrong
- Progress Bars: Visual feedback during model loading and generation
- Model Management: Download, list, and clean up models with simple commands
- Auto-Setup: Detects your hardware and helps install the right PyTorch version
Works on AMD ROCm, NVIDIA CUDA, Apple MPS, and CPU.
curl -fsSL https://raw.githubusercontent.com/zb-ss/hftool/master/install.sh | bashThis auto-detects your GPU, builds a Docker image, and creates a wrapper at ~/.local/bin/hftool. See Docker Install for details and options.
pip install hftoolOn first run, hftool will detect if PyTorch is missing or misconfigured and offer to install it for you:
============================================================
hftool - First Time Setup
============================================================
Detected hardware:
[✓] AMD GPU detected: Radeon RX 7900 XTX
Select PyTorch version to install:
[1] NVIDIA GPU (CUDA)
[2] AMD GPU (ROCm 6.2) (recommended)
[3] Apple Silicon (MPS)
[4] CPU only
[5] Skip (install manually later)
Your choice [2]:
You can also run the setup wizard manually at any time:
hftool setup# Text-to-Image (Z-Image, SDXL, FLUX)
pip install "hftool[with_t2i]"
# Text-to-Video (HunyuanVideo, CogVideoX, Wan2.2)
pip install "hftool[with_t2v]"
# Text-to-Speech (Bark, MMS-TTS)
pip install "hftool[with_tts]"
# Speech-to-Text (Whisper)
pip install "hftool[with_stt]"
# All features
pip install "hftool[all]"For enhanced user experience features:
# Interactive file picker and JSON builder
pip install InquirerPy
# Or for pipx:
pipx runpip hftool install InquirerPyNote: Without InquirerPy, the @ file picker and --interactive mode will not work, but all other features remain functional.
- Python: >= 3.10
- PyTorch: >= 2.0 with CUDA/ROCm support
- ffmpeg: Required for video output and MP3 audio conversion
# Ubuntu/Debian sudo apt install ffmpeg # macOS brew install ffmpeg # Arch Linux sudo pacman -S ffmpeg
git clone https://github.com/zb-ss/hftool
cd hftool
# Install PyTorch first (see Quick Install above for your platform)
pip install torch torchvision torchaudio # or with ROCm/CPU index
# Then install hftool in dev mode
pip install -e ".[dev]" # Includes pytest# Install hftool
pipx install hftool[all]Important for AMD GPU users: The install above pulls in CUDA PyTorch by default. Replace it with ROCm PyTorch:
# AMD ROCm - uninstall CUDA version and install ROCm version:
pipx runpip hftool uninstall torch torchvision torchaudio -y
pipx runpip hftool install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2For other platforms:
# NVIDIA (already installed by default, but to reinstall):
pipx runpip hftool install torch torchvision torchaudio
# CPU only:
pipx runpip hftool uninstall torch torchvision torchaudio -y
pipx runpip hftool install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpuDocker provides the easiest way to use hftool with full GPU support, especially for AMD users who want to keep their system clean for gaming.
Benefits:
- ROCm 7.1.1 isolated from system (won't affect gaming drivers)
- All dependencies pre-installed (no pip conflicts)
- Works on any Linux with Docker
Option 1: One-liner install (recommended)
curl -fsSL https://raw.githubusercontent.com/zb-ss/hftool/master/install.sh | bash
# With options:
curl -fsSL ... | bash -s -- --platform rocm # Force platform
curl -fsSL ... | bash -s -- --install-dir /usr/local/bin # Custom dirOption 2: Manual setup via pip
# Install hftool (thin CLI wrapper)
pip install hftool
# Run the setup wizard
hftool docker setupThe setup wizard will detect your GPU, build a Docker image, and configure hftool to run commands in the container.
Manual Docker commands:
# Check Docker status
hftool docker status
# Build the image manually
hftool docker build
# Run commands in Docker
hftool docker run -- -t t2i -i "A cat" -o cat.png
# GPU selection (AMD ROCm - uses device passthrough for reliable isolation)
hftool docker run --gpu 1 -- -t t2v -i "A cat" -o cat.mp4 # Use specific GPU
hftool docker run --gpu auto -- -t t2i -i "A cat" -o cat.png # Auto-select non-display GPU
hftool docker run --gpu 0,1 -- -t t2v -m ltx2 -i "A cat" -o cat.mp4 # Multi-GPU
# Output files auto-open on host after generation completes
# Use --no-open to disable
hftool docker run -- -t t2i -i "A cat" -o cat.png --no-openDocker GPU Selection (AMD ROCm):
For multi-GPU AMD systems, hftool uses device passthrough to pass only selected GPU(s) to the container. This is more reliable than environment variable isolation:
# Interactive GPU selection (shown when multiple GPUs detected)
hftool docker run -- -t t2i -i "A cat" -o cat.png
# Available GPUs:
# [0] AMD Radeon RX 7900 XTX, 24.0GB (display)
# [1] AMD Radeon RX 7900 XTX, 24.0GB
# GPU> 1
# Explicit selection
hftool docker run --gpu 1 -- -t t2v -i "A cat" -o cat.mp4| Option | Description |
|---|---|
--gpu auto |
Select best non-display GPU |
--gpu 0 |
Use specific GPU by index |
--gpu 0,1 |
Use multiple GPUs (multi-GPU mode) |
| (no option) | Interactive selection if multiple GPUs |
Environment variables passed to Docker:
These environment variables are automatically passed through to the container:
| Variable | Description |
|---|---|
HFTOOL_MODELS_DIR |
Custom models directory (mounted to /models) |
HSA_OVERRIDE_GFX_VERSION |
AMD GPU architecture (e.g., 11.0.0 for RX 7900) |
HF_TOKEN |
HuggingFace token for gated models |
HFTOOL_DEBUG |
Enable debug output |
HFTOOL_LOG_FILE |
Log file path (directory is mounted) |
See docker/README.md for detailed Docker documentation.
# Full interactive wizard - guided experience for beginners
hftool -I
# Or specify everything on command line
hftool -t t2i -i "A cat in space" -o cat.png
# Interactive file selection
hftool -t asr -i @ -o transcript.txt
# Preview before running
hftool -t t2i -i "A cat" --dry-run
# Reproducible generation with seed
hftool -t t2i -i "A cat" -o cat.png --seed 42
# Re-run previous command
hftool history --rerun 5
# Install shell completions for tab completion
hftool completion --installNew features:
- Auto-open: Generated images, audio, and video files automatically open when complete!
- File picker: Use
@to interactively select input files - History: View and re-run previous commands with
hftool history - Dry-run: Preview operations without executing with
--dry-run - Config files: Save preferences in
~/.hftool/config.toml
When you run a task for the first time, hftool will prompt you to download the required model:
============================================================
Model not found: Z-Image Turbo
============================================================
Task: text-to-image
Model: Z-Image Turbo
Repo: Tongyi-MAI/Z-Image-Turbo
Size: ~6.0 GB
Location: /home/user/.hftool/models/Tongyi-MAI--Z-Image-Turbo
Download this model now? [Y/n]:
hftool supports persistent configuration via TOML files for convenience.
# Create default config with helpful comments
hftool config init
# Or manually create ~/.hftool/config.toml# ~/.hftool/config.toml
[defaults]
device = "cuda" # Device to use: auto, cuda, mps, cpu
dtype = "bfloat16" # Data type: bfloat16, float16, float32
auto_open = true # Auto-open output files
verbose = false # Verbose output
[text-to-image]
model = "z-image-turbo" # Default model for this task
num_inference_steps = 9
guidance_scale = 0.0
width = 1024
height = 1024
[text-to-speech]
model = "bark-small"
sample_rate = 24000
[automatic-speech-recognition]
model = "whisper-large-v3"
return_timestamps = true
[aliases]
# Custom model aliases for convenience
fast-image = "Tongyi-MAI/Z-Image-Turbo"
quality-image = "black-forest-labs/FLUX.1-dev"
my-whisper = "openai/whisper-large-v3"
[paths]
models_dir = "~/.hftool/models"
output_dir = "~/ai-outputs"
history_file = "~/.hftool/history.json"Settings are applied in this order (highest to lowest):
- CLI arguments -
hftool -t t2i --device cuda - Environment variables -
HFTOOL_DEVICE=cuda - Project config -
./.hftool/config.toml(current directory) - User config -
~/.hftool/config.toml(home directory) - Built-in defaults
# View current configuration
hftool config show
# Create default config file
hftool config init
# Edit config in your $EDITOR
hftool config edit# With config file setting device=cuda and model=z-image-turbo
hftool -t t2i -i "A cat in space" -o cat.png
# Uses cuda device and z-image-turbo from config
# Override config with CLI args
hftool -t t2i -i "A cat" -o cat.png --device cpu -m sdxl
# Uses cpu device and sdxl model (CLI overrides config)hftool includes a powerful file picker that makes it easy to select input files without typing full paths.
Use @ in the -i / --input parameter to trigger the file picker:
| Syntax | Description | Example |
|---|---|---|
@ |
Interactive file picker (current directory) | hftool -t asr -i @ -o transcript.txt |
@? |
Interactive with fuzzy search (shows all files) | hftool -t t2i -i @? -o output.png |
@. |
Pick from current directory | hftool -t asr -i @. -o transcript.txt |
@~ |
Pick from home directory | hftool -t t2i -i @~ -o output.png |
@/path/ |
Pick from specific directory | hftool -t asr -i @/recordings/ -o transcript.txt |
@*.ext |
Files matching glob pattern | hftool -t asr -i @*.wav -o transcript.txt |
@@ |
Recent files from history | hftool -t t2i -i @@ -o output.png |
When @? is used or no matching files are found, hftool enters interactive mode:
? Select a file:
recording1.wav
recording2.wav
> recording3.wav
music.mp3
podcast.wav
Use arrow keys to select, Enter to confirm, Ctrl+C to cancel.
# Pick a WAV file interactively
hftool -t asr -i @ -o transcript.txt
# Select from all files with fuzzy search
hftool -t t2i -i @? -o output.png
# Pick from a specific directory
hftool -t asr -i @/home/user/recordings/ -o transcript.txt
# Use glob pattern to filter
hftool -t asr -i @*.wav -o transcript.txt
# Recent files from history
hftool -t t2i -i @@ -o output.pngNote: The file picker requires the optional InquirerPy dependency:
pip install InquirerPy
# Or for pipx:
pipx runpip hftool install InquirerPyFor tasks that require complex JSON input (like image-to-image), use --interactive or -i @? to launch an interactive builder:
# Interactive mode for image-to-image
hftool -t i2i --interactive -o output.png
# Or trigger with @?
hftool -t i2i -i @? -o output.pngThe interactive builder guides you through entering parameters:
? image: photo.jpg
? prompt: turn this into a watercolor painting
? seed (optional): 42
? true_cfg_scale (optional): 4.0
? num_inference_steps (optional): 50
Supports:
- Image file selection with file picker
- Multi-image inputs (enter comma-separated paths)
- Optional parameter skipping (press Enter to use defaults)
- Parameter validation and type conversion
hftool tracks all commands you run and allows you to view and re-run them:
# Show recent commands
hftool history
# Show last 20 commands
hftool history -n 20
# Output as JSON
hftool history --jsonExample output:
Recent command history:
================================================================================
[5] ✓ 2024-01-15 14:32:15 - text-to-image
Model: z-image-turbo
Input: A cat in space
Output: cat.png
Seed: 42
Rerun: hftool history --rerun 5
[4] ✗ 2024-01-15 14:28:10 - automatic-speech-recognition
Model: whisper-large-v3
Input: recording.wav
Output: transcript.txt
Error: Model not downloaded
Rerun: hftool history --rerun 4
# Re-run command #5
hftool history --rerun 5
# With confirmation prompt
hftool history --rerun 5
# Shows: Re-running command #5 from 2024-01-15 14:32:15:
# hftool -t text-to-image -i "A cat in space" -o cat.png --seed 42
# Continue? [Y/n]:# Clear all history
hftool history --clearHistory is stored in ~/.hftool/history.json by default. Customize with:
# ~/.hftool/config.toml
[paths]
history_file = "~/custom/path/history.json"Or via environment variable:
export HFTOOL_HISTORY_FILE=~/custom/path/history.jsonPreview operations without executing them. Useful for:
- Checking model requirements before downloading
- Estimating VRAM usage
- Validating parameters
# Preview text-to-image generation
hftool -t t2i -i "A cat in space" -o cat.png --dry-runExample output:
============================================================
Dry-Run Mode: text-to-image
============================================================
Task: text-to-image
Model: Z-Image Turbo (Tongyi-MAI/Z-Image-Turbo)
Size: ~6.0 GB
Device: cuda
Dtype: bfloat16
VRAM: ~10-12 GB estimated
Input: "A cat in space"
Output: cat.png
Parameters:
num_inference_steps: 9
guidance_scale: 0.0
width: 1024
height: 1024
seed: 42
Dependencies:
✓ torch
✓ diffusers
✓ transformers
Status: Model downloaded
Would run: hftool -t text-to-image -i "A cat in space" -o cat.png --seed 42
Use dry-run to:
- Verify dependencies before attempting generation
- Check disk space requirements
- Estimate VRAM usage for your GPU
- Preview parameters from config file
Enable tab completion for faster CLI usage:
# Auto-detect shell and install
hftool completion --install
# Show completion script for bash
hftool completion bash
# Install for specific shell
hftool completion zsh --installAfter installation, restart your shell or run:
- bash:
source ~/.bashrc - zsh:
source ~/.zshrc - fish: Completions load automatically
Completions include:
- Task names and aliases (t2i, text-to-image, etc.)
- Model names (z-image-turbo, whisper-large-v3, etc.)
- Device options (auto, cuda, mps, cpu)
- File picker syntax (@, @?, @~, etc.)
Check your system setup and troubleshoot issues:
# Run all diagnostic checks
hftool doctor
# Output as JSON
hftool doctor --jsonChecks performed:
- Python version (requires 3.10+)
- PyTorch installation and GPU detection
- ffmpeg availability (for video/audio tasks)
- Network connectivity to HuggingFace Hub
- Optional feature dependencies
- Configuration file status
Exit codes: 0=OK, 1=warnings, 2=errors
# List all models
hftool models
# List models for a specific task
hftool models -t text-to-image
hftool models -t t2i # (using alias)
# Show only downloaded models
hftool models --downloaded
# Output as JSON
hftool models --json# Download default model for a task
hftool download -t text-to-image
hftool download -t t2i # (using alias)
# Download specific model by short name
hftool download -t t2i -m sdxl
# Download by HuggingFace repo_id
hftool download -m openai/whisper-large-v3
# Download all default models for all tasks
hftool download --all
# Re-download (force)
hftool download -t t2i -f
# Resume interrupted download (default)
hftool download -t t2i
# Disable resume
hftool download -t t2i --no-resumeNote: Downloads automatically resume if interrupted. Use hftool status to see partial downloads.
# Show downloaded models and disk usage
hftool status# Interactive selection (default) - shows numbered list to choose from
hftool clean
# Delete specific model by name
hftool clean -m whisper-large-v3
# Delete multiple models at once
hftool clean -m whisper-large-v3 -m z-image-turbo
# Delete all downloaded models
hftool clean --all
# Skip confirmation prompts
hftool clean --all -yInteractive selection example:
Downloaded models:
------------------------------------------------------------
[ 1] Whisper Large v3 (automatic-speech-recognition)
openai/whisper-large-v3 - 3.1 GB
[ 2] Z-Image Turbo (text-to-image)
Tongyi-MAI/Z-Image-Turbo - 6.0 GB
------------------------------------------------------------
Enter model numbers to delete (comma-separated, ranges with -, or 'all'):
Examples: 1,3,5 or 1-3 or 1,3-5,7 or all
Selection []: 1,2
By default, models are stored in ~/.hftool/models/. You can customize this:
# Set custom location via environment variable
export HFTOOL_MODELS_DIR=/path/to/models
# Or use one-time
HFTOOL_MODELS_DIR=/mnt/storage hftool -t t2i -i "A cat" -o cat.pngUsing a .env file (recommended):
Create a .env file in your project directory or ~/.hftool/.env:
# .env
HFTOOL_MODELS_DIR=/data/models
HFTOOL_AUTO_DOWNLOAD=1
HFTOOL_AUTO_OPEN=0
HFTOOL_DEBUG=0 # Set to 1 to show all warningshftool automatically loads .env files on startup.
Some models like FLUX.2-klein-9B require accepting a license agreement and HuggingFace authentication:
# Option 1: Login with huggingface-cli (recommended)
pip install huggingface_hub
huggingface-cli login
# Follow prompts to enter your token
# Option 2: Set environment variable
export HF_TOKEN=your_token_here
# Option 3: Add to .env file
echo "HF_TOKEN=your_token_here" >> ~/.hftool/.envSteps for gated models:
- Visit the model page (e.g., https://huggingface.co/black-forest-labs/FLUX.2-klein-9B)
- Accept the license agreement
- Create an access token at https://huggingface.co/settings/tokens
- Login with
huggingface-cli loginor setHF_TOKEN
hftool will automatically detect your token and show a warning if authentication is missing for gated models.
Important: Token permissions for gated repos
If you get errors like "cannot find the requested files" or "check your internet connection" when downloading gated models, your token may lack the required permissions.
When creating your token at https://huggingface.co/settings/tokens:
- Recommended: Use a "Read" token (classic type) - works with all repos
- Fine-grained tokens: Must have "Access to public gated repos" enabled
To check/fix your token:
- Go to https://huggingface.co/settings/tokens
- Click on your token to view its permissions
- Ensure it has access to gated repositories
By default, hftool suppresses noisy warnings from dependencies (torch, diffusers, transformers). To see all warnings for debugging:
# Via environment variable
HFTOOL_DEBUG=1 hftool -t i2i -i '{"image": "photo.jpg", "prompt": "..."}'
# Or in .env file
HFTOOL_DEBUG=1File Logging: Save all warnings and debug info to a log file:
# Via environment variable
HFTOOL_LOG_FILE=~/.hftool/hftool.log hftool -t i2i ...
# Or in .env file (recommended)
HFTOOL_LOG_FILE=~/.hftool/hftool.logThe log file captures all warnings, errors, and debug info even when HFTOOL_DEBUG=0. Useful for troubleshooting issues without cluttering the terminal.
To skip interactive prompts and auto-download models:
export HFTOOL_AUTO_DOWNLOAD=1By default, generated images, audio, and video files automatically open in your system's default application when complete. Control this with:
# Always open (even text files)
hftool -t t2i -i "A cat" -o cat.png --open
# Never open
hftool -t t2i -i "A cat" -o cat.png --no-open
# Or set via environment variable
export HFTOOL_AUTO_OPEN=1 # Always open
export HFTOOL_AUTO_OPEN=0 # Never openDefault behavior: Auto-opens image, audio, and video files. Text output is printed to console.
hftool -t <task> -i <input> [-m <model>] [-o <output>] [-- extra_args]hftool --list-tasks| Alias | Full Name |
|---|---|
t2i |
text-to-image |
i2i, img2img |
image-to-image |
t2v |
text-to-video |
i2v |
image-to-video |
tts |
text-to-speech |
asr, stt |
automatic-speech-recognition |
llm |
text-generation |
Generate images with Z-Image-Turbo (state-of-the-art open-source model):
# Basic usage (uses default model)
hftool -t t2i -i "A cat wearing a space helmet" -o cat_space.png
# With specific model
hftool -t t2i -m Tongyi-MAI/Z-Image-Turbo \
-i "A photorealistic sunset over mountains" \
-o sunset.png
# With custom parameters (Z-Image-Turbo uses 9 steps, guidance_scale=0)
hftool -t t2i -m Tongyi-MAI/Z-Image-Turbo \
-i "A renaissance painting of a robot" \
-o robot.png \
-- --num_inference_steps 9 --guidance_scale 0.0 --height 1024 --width 1024Other supported models:
stabilityai/stable-diffusion-xl-base-1.0black-forest-labs/FLUX.1-schnell
Transform existing images with Qwen Image Edit (default), FLUX.2 Klein, or SDXL:
# Basic image editing with Qwen Image Edit (default)
hftool -t i2i \
-i '{"image": "photo.jpg", "prompt": "turn this into a watercolor painting"}' \
-o watercolor.png
# Multi-image editing - combine multiple images (Qwen feature)
hftool -t i2i \
-i '{"image": ["person1.jpg", "person2.jpg"], "prompt": "Both people standing together in a park"}' \
-o combined.png
# With custom parameters
hftool -t i2i \
-i '{"image": "portrait.jpg", "prompt": "as a Renaissance painting"}' \
-o renaissance.png \
-- --seed 42 --true_cfg_scale 4.0 --num_inference_steps 50
# FLUX.2 Klein - fast multi-reference editing (sub-second generation)
hftool -t i2i -m flux2-klein \
-i '{"image": "person.jpg", "prompt": "the person from image 1 as an astronaut on Mars"}' \
-o astronaut.png
# FLUX.2 Klein with multiple reference images
hftool -t i2i -m flux2-klein \
-i '{"image": ["cat.jpg", "dog.jpg"], "prompt": "the cat from image 1 and dog from image 2 playing together"}' \
-o pets.png \
-- --seed 42
# Style transfer with SDXL Refiner (smaller model, faster)
hftool -t i2i -m sdxl-refiner \
-i '{"image": "landscape.jpg", "prompt": "professional photography, enhanced colors"}' \
-o enhanced.png \
-- --strength 0.3Supported models:
Qwen/Qwen-Image-Edit-2511(default, 25 GB) - Advanced editing with character consistency, multi-image supportblack-forest-labs/FLUX.2-klein-9B(29 GB) - Fast multi-reference editing in 4 steps (non-commercial license)stabilityai/stable-diffusion-xl-refiner-1.0(6.2 GB) - Fast refinement and subtle changesstabilityai/stable-diffusion-xl-base-1.0(6.5 GB) - Stronger style transfer
Input format: JSON with image (path or list of paths) and prompt (edit description)
Qwen Image Edit parameters (pass after --):
| Parameter | Default | Description |
|---|---|---|
--seed |
random | Random seed for reproducibility |
--true_cfg_scale |
4.0 | True CFG scale (higher = stronger prompt adherence) |
--num_inference_steps |
40 | Number of denoising steps |
--guidance_scale |
1.0 | Standard CFG guidance scale |
--negative_prompt |
" " | What to avoid in generation |
FLUX.2 Klein parameters (pass after --):
| Parameter | Default | Description |
|---|---|---|
--seed |
random | Random seed for reproducibility |
--num_inference_steps |
4 | Number of denoising steps (optimized for 4) |
--guidance_scale |
1.0 | CFG guidance scale |
--height |
1024 | Output image height |
--width |
1024 | Output image width |
FLUX.2 Klein tips:
- Reference images in prompts using "image 1", "image 2", etc.
- Supports up to 10 reference images per generation
- Requires ~29GB VRAM (RTX 4090 and above)
- Non-commercial license - requires accepting terms at HuggingFace
- Requires diffusers from main branch (auto-installed on first use)
SDXL Refiner/Base parameters:
| Parameter | Default | Description |
|---|---|---|
--seed |
random | Random seed for reproducibility |
--strength |
0.3-0.7 | How much to change the image (0.0-1.0) |
--num_inference_steps |
30 | Number of denoising steps |
--guidance_scale |
7.5 | CFG guidance scale |
Qwen Image Edit features:
- Character consistency: Preserves identity in imaginative edits
- Multi-image input: Combine multiple images into one scene
- Industrial design: Batch product design and material replacement
- Geometric reasoning: Generate auxiliary construction lines
Memory requirements: Qwen Image Edit requires ~25GB VRAM. For GPUs with less memory:
# Use multi-GPU (distributes across available GPUs)
hftool -t i2i -i '{"image": "photo.jpg", "prompt": "..."}' -o out.png --gpu all
# Use CPU offload (slower but works on 16-24GB GPUs)
HFTOOL_CPU_OFFLOAD=1 hftool -t i2i -i '{"image": "photo.jpg", "prompt": "..."}' -o out.png
# Use sequential CPU offload (most memory efficient, slowest)
HFTOOL_CPU_OFFLOAD=2 hftool -t i2i -i '{"image": "photo.jpg", "prompt": "..."}' -o out.pngNote: Qwen Image Edit requires diffusers >= 0.36.0. Upgrade with:
pip install --upgrade diffusers>=0.36.0
# Or for pipx:
pipx runpip hftool install --upgrade diffusers>=0.36.0Generate videos with LTX-2, HunyuanVideo, or other models:
# LTX-2 (fast, high quality - requires diffusers main branch)
hftool -t t2v -m ltx2 -i "A cat playing with a ball in slow motion" -o cat.mp4
# LTX-2 Image-to-Video (animate an image)
hftool -t i2v -m ltx2-i2v \
-i '{"image": "photo.jpg", "prompt": "The person waves hello"}' \
-o animated.mp4
# HunyuanVideo-1.5 (480p, ~2.5 second video)
hftool -t t2v -i "A person walking on a beach at sunset" -o beach.mp4
# With specific model and parameters
hftool -t t2v -m hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v \
-i "A timelapse of clouds moving over a city" \
-o clouds.mp4 \
-- --num_frames 61 --num_inference_steps 30
# HunyuanVideo Image-to-Video
hftool -t i2v -m hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v \
-i '{"image": "photo.jpg", "prompt": "The person waves hello"}' \
-o animated.mp4Supported models:
Lightricks/LTX-2(ltx2, ltx2-i2v) - Fast, high quality. Requires diffusers main branchhunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v- High quality 480pTHUDM/CogVideoX-5bWan-AI/Wan2.1-T2V-1.3B
LTX-2 parameters (pass after --):
| Parameter | Default | Description |
|---|---|---|
--seed |
random | Random seed for reproducibility |
--num_inference_steps |
50 | Number of denoising steps |
--guidance_scale |
3.0 | CFG guidance scale |
--num_frames |
97 | Number of frames to generate |
--height |
512 | Video height (must be divisible by 32) |
--width |
768 | Video width (must be divisible by 32) |
Note: Requires system ffmpeg for video encoding. LTX-2 requires diffusers from main branch (auto-installed on first use).
Generate speech with Bark:
# Basic usage (uses bark-small by default)
hftool -t tts -i "Hello, this is a test of the text to speech system." -o hello.wav
# With full Bark model (higher quality, larger)
hftool -t tts -m suno/bark \
-i "Welcome to hftool, your command-line AI assistant." \
-o welcome.wav
# Output as MP3 (requires ffmpeg)
hftool -t tts -i "This will be saved as MP3." -o output.mp3Supported models:
suno/bark-small(default, 1.5 GB, fast)suno/bark(5 GB, full quality, multi-language, sound effects)facebook/mms-tts-eng(0.3 GB, lightweight)
GLM-TTS requires manual installation:
# Clone the repository
git clone https://github.com/zai-org/GLM-TTS.git
cd GLM-TTS && pip install -r requirements.txt
# Set environment variable
export GLMTTS_PATH=/path/to/GLM-TTS
# Run
hftool -t tts -m zai-org/GLM-TTS -i "你好世界" -o hello_chinese.wavTranscribe audio with Whisper:
# Basic transcription
hftool -t asr -i recording.wav -o transcript.txt
# With specific model
hftool -t asr -m openai/whisper-large-v3 -i podcast.mp3 -o transcript.txt
# With timestamps (outputs JSON)
hftool -t asr -i interview.wav -o transcript.json \
-- --return_timestamps true
# Generate SRT subtitles
hftool -t asr -i video_audio.wav -o subtitles.srt \
-- --return_timestamps true --format srtSupported models:
openai/whisper-large-v3(best quality)openai/whisper-mediumopenai/whisper-small(fastest)
Run language models:
# Basic generation
hftool -t llm -m meta-llama/Llama-3.2-1B-Instruct \
-i "Explain quantum computing in simple terms:" \
-o response.txt \
-- --max_new_tokens 200# Image Classification
hftool -t image-classification -m google/vit-base-patch16-224 \
-i photo.jpg -o result.json
# Object Detection
hftool -t object-detection -m facebook/detr-resnet-50 \
-i street.jpg -o detections.json
# Summarization
hftool -t summarization -m facebook/bart-large-cnn \
-i article.txt -o summary.txt
# Translation
hftool -t translation -m Helsinki-NLP/opus-mt-en-de \
-i "Hello, how are you?" -o translation.txtUsage: hftool [OPTIONS] COMMAND [ARGS]...
Options:
-t, --task TEXT Task to perform
-m, --model TEXT Model name/path (uses task default if omitted)
-i, --input TEXT Input data: text, file path, @ reference, @? for interactive
-o, --output-file TEXT Output file path (auto-generated if omitted)
-d, --device TEXT Device: auto, cuda, mps, cpu (default: auto)
-g, --gpu TEXT GPU selection: auto, all, 0, 1, 0,1 (multi-GPU)
--dtype TEXT Data type: bfloat16, float16, float32
--seed INTEGER Random seed for reproducible generation
--interactive Interactive mode for complex inputs (JSON builder)
--dry-run Preview operation without executing
--open / --no-open Open output with default app (auto for media files)
--list-tasks List all available tasks and aliases
-v, --verbose Show detailed progress
--help Show this message and exit
Commands:
setup Run interactive PyTorch setup wizard
config View and manage configuration (show, init, edit)
docker Manage Docker-based execution (setup, status, build, run)
models List available models for tasks
download Download models from HuggingFace Hub
status Show download status and disk usage
clean Delete downloaded models
history View and manage command history (--rerun, --clear)
run Run a task (alternative to -t flag)
| Variable | Description | Default |
|---|---|---|
HFTOOL_MODELS_DIR |
Custom models storage directory | ~/.hftool/models/ |
HFTOOL_AUTO_DOWNLOAD |
Auto-download models without prompting | 0 (disabled) |
HFTOOL_AUTO_OPEN |
Auto-open output files | auto (media files only) |
HFTOOL_GPU |
GPU selection: auto, all, 0, 1, 0,1 |
(none) |
HFTOOL_MULTI_GPU |
Multi-GPU mode: 1/balanced enables, 0 disables |
auto-detect |
HFTOOL_CPU_OFFLOAD |
CPU offload level: 0 disabled, 1 model, 2 sequential |
(none) |
HFTOOL_ROCM_PATH |
Path to ROCm libraries (e.g., Ollama's bundled ROCm) | (none) |
HSA_OVERRIDE_GFX_VERSION |
AMD GPU architecture override (e.g., 11.0.0 for RX 7900) |
(none) |
HF_TOKEN |
HuggingFace token for gated models | (none) |
Use -- to pass additional arguments to the underlying model:
hftool -t t2i -i "A cat" -o cat.png \
-- --num_inference_steps 20 --guidance_scale 7.5 --seed 42hftool is optimized for AMD GPUs with ROCm 6.x:
| Task | Model | VRAM Required | Notes |
|---|---|---|---|
| Text-to-Image | Z-Image-Turbo | ~10-12 GB | Comfortable on RX 7900 XTX |
| Image-to-Image | Qwen Image Edit | ~20-24 GB | Use CPU offload on 24GB cards |
| Image-to-Image | FLUX.2 Klein | ~29 GB | RTX 4090+, non-commercial |
| Image-to-Image | SDXL Refiner | ~8-10 GB | Fast, lower VRAM |
| Text-to-Video | LTX-2 | ~40 GB | Use --gpu all for multi-GPU |
| Text-to-Video | HunyuanVideo 480p | ~20-24 GB | Use CPU offload |
| Text-to-Video | HunyuanVideo 720p | ~30-40 GB | Requires multi-GPU |
| Text-to-Speech | Bark | ~2-4 GB | Easy |
| Speech-to-Text | Whisper-large-v3 | ~4-6 GB | Easy |
If you have Ollama installed, you can use its bundled ROCm libraries instead of installing ROCm system-wide (which can interfere with gaming GPU drivers).
Step 1: Install PyTorch ROCm in your hftool environment:
# If using pipx:
pipx runpip hftool uninstall torch torchvision torchaudio -y
pipx runpip hftool install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
# If using pip:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2Step 2: Add ROCm configuration to your .env file (~/.hftool/.env or project directory):
# Use Ollama's bundled ROCm libraries
HFTOOL_ROCM_PATH=/usr/local/lib/ollama/rocm
# Set your GPU architecture (required for AMD GPUs)
# RDNA3: gfx1100 (RX 7900 XTX/XT), gfx1101 (RX 7800/7700), gfx1102 (RX 7600)
# RDNA2: gfx1030 (RX 6900/6800), gfx1031 (RX 6700), gfx1032 (RX 6600)
HSA_OVERRIDE_GFX_VERSION=11.0.0Step 3: Verify GPU detection:
hftool -t t2i -i "test" -o test.png -v
# Should show "Using device: cuda" or similarWorks with CUDA 11.8+ and modern NVIDIA GPUs.
Basic support for M1/M2/M3 Macs. Some models may require --dtype float32.
Works but slow. Use smaller models:
openai/whisper-smallfor ASRsuno/bark-smallfor TTS
For systems with multiple GPUs (e.g., dual RX 7900 XTX), hftool can automatically detect which GPU has your display connected and route compute workloads to the other GPU. This prevents VRAM conflicts that can crash your desktop compositor.
Check your GPUs:
hftool doctor
# Shows:
# GPU 0: AMD Radeon RX 7900 XTX [DISPLAY]
# GPU 1: AMD Radeon RX 7900 XTX <- recommendedGPU Selection Options:
| Option | Description |
|---|---|
--gpu auto |
Smart selection - uses GPU without display (default behavior) |
--gpu 0 |
Use specific GPU by index |
--gpu 1 |
Use specific GPU by index |
--gpu 0,1 |
Use multiple specific GPUs |
--gpu all |
Use all GPUs with model parallelism (distributes model across GPUs) |
How --gpu all works:
When you select --gpu all, hftool uses device_map="balanced" to automatically distribute model layers across all available GPUs. This is essential for large models that don't fit in a single GPU's VRAM. The centralized multi-GPU logic in hftool/core/device.py ensures consistent behavior across all task types (text-to-image, text-to-video, image-to-image, etc.).
Examples:
# Auto-select compute GPU (avoids display GPU)
hftool -t t2v -i "A cat running" -o cat.mp4 --gpu auto
# Use specific GPU
hftool -t t2v -i "A cat running" -o cat.mp4 --gpu 1
# Use all GPUs for large models like LTX-2 or HunyuanVideo
hftool -t t2v -m ltx2 -i "A cat running" -o cat.mp4 --gpu all
# Docker with specific GPU
hftool docker run --gpu 1 -- -t t2v -i "A cat" -o cat.mp4
# Environment variable (useful in .env file)
HFTOOL_GPU=1 hftool -t t2v -i "A cat" -o cat.mp4
# Force multi-GPU mode via environment variable
HFTOOL_MULTI_GPU=1 hftool -t t2v -m ltx2 -i "A cat" -o cat.mp4Interactive Mode: When using hftool -I, the wizard will show GPU selection with display detection for multi-GPU systems.
hftool/
├── cli.py # CLI entry point with subcommands
├── core/
│ ├── device.py # ROCm/CUDA/MPS/CPU detection
│ ├── registry.py # Task registry and configuration
│ ├── models.py # Model registry with download metadata
│ └── download.py # Model download manager
├── tasks/
│ ├── base.py # Abstract base task class
│ ├── text_to_image.py
│ ├── image_to_image.py
│ ├── text_to_video.py
│ ├── text_to_speech.py
│ ├── speech_to_text.py
│ └── transformers_generic.py
├── io/
│ ├── input_loader.py # Input handling
│ └── output_handler.py # Output handling (ffmpeg)
└── utils/
└── deps.py # Dependency checking
pip install -e ".[dev]"
pytest tests/ -vMIT License
- Z-Image - State-of-the-art text-to-image
- Qwen Image Edit - Advanced image editing with character consistency
- LTX-2 - Fast, high-quality video generation
- HunyuanVideo-1.5 - High-quality video generation
- Bark - High-quality TTS with sound effects
- Whisper - Speech recognition