Skip to content

Test Deepseek R1 Qwen 14B on Pi 5 with AMD W7700 #9

@geerlingguy

Description

@geerlingguy

I set up the AMD W7700 on a Pi 5 16GB with kernel 6.12.y following the guide in geerlingguy/raspberry-pi-pcie-devices#680

Then I downloaded and tried running Deepseek Qwen 14B distilled:

# Follow https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5

cd models
wget https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf

export GGML_VK_FORCE_MAX_ALLOCATION_SIZE=2147483647
cd ../
./build/bin/llama-cli -m "models/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf" -p "Why is the sky blue?" -n 50 -e -ngl 33 -t 4

But I'm only seeing part of the model being offloaded to the GPU:

load_tensors: offloading 33 repeating layers to GPU
load_tensors: offloaded 33/49 layers to GPU
load_tensors:      Vulkan0 model buffer size =  5155.22 MiB
load_tensors:   CPU_Mapped model buffer size =  3410.82 MiB

And token performance is pretty bad, with GPU only utilizing about 10% of it's capacity, and CPU 350%+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions