Stars
Get your documents ready for gen AI
High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, RDNA3), Vulkan backend for consumer GPUs. No CUDA required.
Verify Precision of all Kimi K2 API Vendor
Set up root-on-zfs using whole disk, with dracut and zfsbootmenu
tiny, portable SOCKS5 server with very moderate resource usage
With this program you can bind applications to a specific network interface / network adapter. This is very useful if you have multiple (internet) connections and want your program to use a specifi…
Example playbooks to setup your OpenWRT-router with ansible
OpenWrt configuration for router + dumb access points with Ansible playbook for centralised management
MCP server that interacts with Obsidian via the Obsidian rest API community plugin
Small, pragmatic helpers for building command-line apps with System.CommandLine and the Microsoft.Extensions hosting/DI stack.
llama.cpp fork with additional SOTA quants and improved performance
Muffin is a fast, simple and asyncronous web-framework for Python 3
the little async embedded visualization framework that could (Python)
AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim and Neovim.
Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.
Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
RLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooks
Replace 'hub' with 'ingest' in any GitHub URL to get a prompt-friendly extract of a codebase
NVIDIA Linux open GPU with P2P support