Working on LLM inference systems, KV cache compression, and kernel-level optimizations (TurboQuant).
Stars
3
results
for forked starred repositories
Clear filter
TheTom / llama-cpp-turboquant
Forked from ggml-org/llama.cppLLM inference in C/C++
LLAMA Turboquant implementation with CUDA support
miolini / autoresearch-macos
Forked from karpathy/autoresearchAI agents running research on single-GPU nanochat training automatically adopted for MacOS