Test Deepseek R1 Qwen 14B on Pi 5 with AMD W7700

I set up the AMD W7700 on a Pi 5 16GB with kernel `6.12.y` following the guide in https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/680

Then I downloaded and tried running Deepseek Qwen 14B distilled:

```
# Follow https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5

cd models
wget https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf

export GGML_VK_FORCE_MAX_ALLOCATION_SIZE=2147483647
cd ../
./build/bin/llama-cli -m "models/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf" -p "Why is the sky blue?" -n 50 -e -ngl 33 -t 4
```

But I'm only seeing part of the model being offloaded to the GPU:

```
load_tensors: offloading 33 repeating layers to GPU
load_tensors: offloaded 33/49 layers to GPU
load_tensors:      Vulkan0 model buffer size =  5155.22 MiB
load_tensors:   CPU_Mapped model buffer size =  3410.82 MiB
```

And token performance is pretty bad, with GPU only utilizing about 10% of it's capacity, and CPU 350%+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test Deepseek R1 Qwen 14B on Pi 5 with AMD W7700 #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Test Deepseek R1 Qwen 14B on Pi 5 with AMD W7700 #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions