Stars
turbo-tan / llama.cpp-tq3
Forked from ggml-org/llama.cppllama.cpp fork with TQ3_1S/4S CUDA kernels — 3.5-bit WHT quantization achieving Q4s quality at 10% smaller size. Based on RaBitQ-inspired Walsh-Hadamard transform. Enables 27B models on 16GB GPUs w…
Infinitic is an open source orchestration framework for application teams to build durable and flexible backend processes.