llama-cpp-hf-setup

Interactive terminal workflow to clone llama.cpp, build with CMake, download GGUF weights from Hugging Face using hf download, and run llama-cli / llama-server. Model sources and filename patterns live in models.json next to the script—edit that file to add or remove entries.

Main menu

Option	What it does
Full Installation	Pick install root → clone → build → download a model from `models.json`
Build llama.cpp only	Compile only (default root `~/llama-cpp-hf`)
Download model only	Download GGUF into an existing `llama.cpp` tree
Run model	Pick a local `.gguf`, then chat or start the API server
Exit	Quit

The download step installs huggingface_hub and hf_transfer via pip when needed. hf download expects a working Hugging Face setup; for gated or private models, run huggingface-cli login or set HF_TOKEN.

Requirements (checked at startup)

git — clone llama.cpp
cmake — configure and compile
pip / pip3 — install Python packages
python3 — parse models.json and helpers

If anything is missing, the script suggests example installs (e.g. brew install … on macOS). You also need free disk space for sources, build artifacts, and models.

Quick start

cd /path/to/llama-cpp-hf-setup
chmod +x setup-llama-cpp-hf.sh
./setup-llama-cpp-hf.sh

Run the script from this directory so it finds models.json beside it.

Defaults

Item	Default
Install root	`~/llama-cpp-hf`
llama.cpp checkout	`$INSTALL_DIR/llama.cpp`
GGUF download dir	`$INSTALL_DIR/llama.cpp/models/`
Default context size (when offered)	`65536` (adjust in the run flow)
Default API port	`8080`

`models.json`

Top level: a JSON object with version and a models array. Each entry describes how to resolve a Hugging Face repo and target filename.

Field	Meaning
`id`	Internal id
`label`	Long label for display
`list_name`	Short menu name (optional; derived from `label` if omitted)
`hf_repo`	Fixed repo, e.g. `org/repo-GGUF`
`hf_repo_pattern`	Template with `{size}`, e.g. `unsloth/Qwen3.5-{size}-GGUF`
`artifact_pattern`	Filename template with `{size}` and `{quant}`
`sizes`	List of `{ "value", "label" }` choices
`quants`	List of `{ "value", "label" }` quantization names (e.g. `Q4_K_M`)

If several rules could match the same basename, the first matching entry in the array wins. Append new objects to models for more sources (names must match what is on Hugging Face).

Build and run notes

NVIDIA GPU: If nvidia-smi is present, you can enable CUDA (-DGGML_CUDA=ON) during install.
Apple Silicon: On macOS, llama-cli is invoked with --n-gpu-layers 99 so layers run on Metal instead of CPU-only.
Existing build: If build/bin/llama-cli and llama-server exist, you can skip a full rebuild or rebuild from scratch.

Troubleshooting

No hf command: The download step installs huggingface_hub; if it still fails, run pip install -U huggingface_hub hf_transfer and ensure hf is on PATH.
401/403 on download: Log in to Hugging Face or set HF_TOKEN.
CMake / link errors: The script unsets some environment variables that break linking; if you build manually, avoid conflicting LDFLAGS and similar.

License

This repository contains only the installer script and sample config. llama.cpp and model weights are governed by their respective upstream and Hugging Face terms—check those before use.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
models.json		models.json
setup-llama-cpp-hf.sh		setup-llama-cpp-hf.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama-cpp-hf-setup

Main menu

Requirements (checked at startup)

Quick start

Defaults

`models.json`

Build and run notes

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llama-cpp-hf-setup

Main menu

Requirements (checked at startup)

Quick start

Defaults

models.json

Build and run notes

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`models.json`

Packages