0% found this document useful (0 votes)
70 views1 page

Guff Command

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views1 page

Guff Command

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

On the command line, including multiple files at once

I recommend using the huggingface-hub Python library:

pip3 install huggingface-hub

Then you can download any individual model file to the current directory, at high
speed, with a command like this:

huggingface-cli download TheBloke/KafkaLM-70B-German-V0.1-GGUF kafkalm-70b-german-


v0.1.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

More advanced huggingface-cli download usage (click to read)


Example llama.cpp command
Make sure you are using llama.cpp from commit d0cee0d or later.

./main -ngl 35 -m kafkalm-70b-german-v0.1.Q4_K_M.gguf --color -c 4096 --temp 0.7 --


repeat_penalty 1.1 -n -1 -p "<|system|>\n{system_message}</s>\n<|user|>\
n{prompt}</s>\n<|assistant|>"

Change -ngl 32 to the number of layers to offload to GPU. Remove it if you don't
have GPU acceleration.

Change -c 4096 to the desired sequence length. For extended sequence models - eg
8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file
and set by llama.cpp automatically. Note that longer sequence lengths require much
more resources, so you may need to reduce this value.

If you want to have a chat-style conversation, replace the -p <PROMPT> argument


with -i -ins

For other parameters and how to use them, please refer to the llama.cpp
documentation

You might also like