On the command line, including multiple files at once
I recommend using the huggingface-hub Python library:
pip3 install huggingface-hub
Then you can download any individual model file to the current directory, at high
speed, with a command like this:
huggingface-cli download TheBloke/KafkaLM-70B-German-V0.1-GGUF kafkalm-70b-german-
v0.1.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
More advanced huggingface-cli download usage (click to read)
Example llama.cpp command
Make sure you are using llama.cpp from commit d0cee0d or later.
./main -ngl 35 -m kafkalm-70b-german-v0.1.Q4_K_M.gguf --color -c 4096 --temp 0.7 --
repeat_penalty 1.1 -n -1 -p "<|system|>\n{system_message}</s>\n<|user|>\
n{prompt}</s>\n<|assistant|>"
Change -ngl 32 to the number of layers to offload to GPU. Remove it if you don't
have GPU acceleration.
Change -c 4096 to the desired sequence length. For extended sequence models - eg
8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file
and set by llama.cpp automatically. Note that longer sequence lengths require much
more resources, so you may need to reduce this value.
If you want to have a chat-style conversation, replace the -p <PROMPT> argument
with -i -ins
For other parameters and how to use them, please refer to the llama.cpp
documentation