Tags: xyc/llama.cpp
Tags
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (ggml… …-org#6183) * k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
llava : add MobileVLM_V2 backup (ggml-org#6175) * Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace * fix deifinition mistake in clip.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
cuda : refactor to remove global resources (ggml-org#6170) * cuda : refactor to remove global resources
Server: version bump for httplib and json (ggml-org#6169) * server: version bump for httplib and json * fix build * bring back content_length
PreviousNext