Skip to content

sakana-ctf/v_llama_cpp

Repository files navigation

v_llama_cpp

License

English|中文

v_llama_cpp is the V language binding for llama.cpp, allowing you to directly use llama.cpp functionality in V language projects.

What is llama.cpp?

llama.cpp is an LLM (Large Language Model) inference framework implemented in C++, with the following main features:

  • Pure CPU Inference: Run large models without a GPU
  • Quantization Support: Supports INT4, INT5, INT8 and other quantization formats, significantly reducing memory requirements
  • Cross-Platform: Works on Windows, Linux, macOS, and even mobile devices
  • Efficient Performance: Optimized for ordinary hardware, runs on regular laptops

Simply put, llama.cpp allows you to run large models like Deepseek, Qwen, ChatGLM locally on consumer-grade hardware.

Installation

Manual Setup

It is recommended to download the source code using git:

# Download from Github
git clone https://github.com/sakana-ctf/v_llama_cpp
# For users in China, download from Gitee
git clone https://gitee.com/sakana_ctf/v_llama_cpp

Build and check the llama.cpp environment; if the llama.cpp environment does not exist, it will attempt to install it:

v install.vsh

Note: Installing llama.cpp with vlang may require root privileges. You can use sudo v build.vsh

Uninstall

A convenient method is now provided to uninstall the current repository:

v unstall.vsh

If you had configured v_llama_cpp before updating, it will be uninstalled first and then reinstalled during the installation process.

Direct Installation [future]

Direct installation using the following command is planned for the future:

v install v_llama_cpp

Usage

Example

Several basic examples are provided in the ./examples/ folder. Below is the simplest calling method: ./examples/ez_simple.v:

module main

import os
import v_llama_cpp {
        ModelUrl,
}

fn main() {
        model_url := ModelUrl{
                url:     [
                        'https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/resolve/master/DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf',
                        'https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf',
                ]
                sha256: '6b01273c847100f7e594c34869670430fc3597b3897f839664ed4ba4588f5c54'
        }
        model_path := './DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf'
        mut ctx := ModelUrl(model_url).ez_load_model(model_path, -1, 2048, 512) or {
                println('load model failed.')
                return
        }
        input_buffer := os.input('>')
        prompt := '<|User|>${input_buffer}<|Assistant|><think>\n'
        print('deepseek: ')
        ctx.ez_response(prompt, 512, 256, print_token) or { println('response failed.') }
}

fn print_token(token string) {
        print(token)
}

The model file will be automatically downloaded to the './DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf' directory where the program is located. It is recommended to obtain model files from the following sources:

About

lightweight V wrapper for Llama.cpp, enabling efficient LLM execution.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors