Skip to content

phainia/llama-cpp-hf-setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

llama-cpp-hf-setup

Interactive terminal workflow to clone llama.cpp, build with CMake, download GGUF weights from Hugging Face using hf download, and run llama-cli / llama-server. Model sources and filename patterns live in models.json next to the script—edit that file to add or remove entries.

Main menu

Option What it does
Full Installation Pick install root → clone → build → download a model from models.json
Build llama.cpp only Compile only (default root ~/llama-cpp-hf)
Download model only Download GGUF into an existing llama.cpp tree
Run model Pick a local .gguf, then chat or start the API server
Exit Quit

The download step installs huggingface_hub and hf_transfer via pip when needed. hf download expects a working Hugging Face setup; for gated or private models, run huggingface-cli login or set HF_TOKEN.

Requirements (checked at startup)

  • git — clone llama.cpp
  • cmake — configure and compile
  • pip / pip3 — install Python packages
  • python3 — parse models.json and helpers

If anything is missing, the script suggests example installs (e.g. brew install … on macOS). You also need free disk space for sources, build artifacts, and models.

Quick start

cd /path/to/llama-cpp-hf-setup
chmod +x setup-llama-cpp-hf.sh
./setup-llama-cpp-hf.sh

Run the script from this directory so it finds models.json beside it.

Defaults

Item Default
Install root ~/llama-cpp-hf
llama.cpp checkout $INSTALL_DIR/llama.cpp
GGUF download dir $INSTALL_DIR/llama.cpp/models/
Default context size (when offered) 65536 (adjust in the run flow)
Default API port 8080

models.json

Top level: a JSON object with version and a models array. Each entry describes how to resolve a Hugging Face repo and target filename.

Field Meaning
id Internal id
label Long label for display
list_name Short menu name (optional; derived from label if omitted)
hf_repo Fixed repo, e.g. org/repo-GGUF
hf_repo_pattern Template with {size}, e.g. unsloth/Qwen3.5-{size}-GGUF
artifact_pattern Filename template with {size} and {quant}
sizes List of { "value", "label" } choices
quants List of { "value", "label" } quantization names (e.g. Q4_K_M)

If several rules could match the same basename, the first matching entry in the array wins. Append new objects to models for more sources (names must match what is on Hugging Face).

Build and run notes

  • NVIDIA GPU: If nvidia-smi is present, you can enable CUDA (-DGGML_CUDA=ON) during install.
  • Apple Silicon: On macOS, llama-cli is invoked with --n-gpu-layers 99 so layers run on Metal instead of CPU-only.
  • Existing build: If build/bin/llama-cli and llama-server exist, you can skip a full rebuild or rebuild from scratch.

Troubleshooting

  • No hf command: The download step installs huggingface_hub; if it still fails, run pip install -U huggingface_hub hf_transfer and ensure hf is on PATH.
  • 401/403 on download: Log in to Hugging Face or set HF_TOKEN.
  • CMake / link errors: The script unsets some environment variables that break linking; if you build manually, avoid conflicting LDFLAGS and similar.

License

This repository contains only the installer script and sample config. llama.cpp and model weights are governed by their respective upstream and Hugging Face terms—check those before use.

About

Interactive Bash installer to build llama.cpp, download Hugging Face GGUF models via hf download, and run llama-cli or llama-server—configured by models.json.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages