Stars
Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8
FSampler is a training‑free, sampler‑agnostic acceleration layer for diffusion sampling.
Enable true multi gpu capability in Comfy UI using XDiT XFuser and FSDP managed by Ray
Adds customizable sliders for Custom (OpenAI-compatible) source of Chat Completion API.
llama.cpp fork with additional SOTA quants and improved performance
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
Open-Sora: Democratizing Efficient Video Production for All
A fast inference library for running LLMs locally on modern consumer-class GPUs
Large Language Model Text Generation Inference
YaRN: Efficient Context Window Extension of Large Language Models
A single Gradio + React WebUI with extensions for ACE-Step, OmniVoice, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet,…
A webui for different audio related Neural Networks
Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.
JonathanFly / bark
Forked from suno-ai/bark🚀 BARK INFINITY GUI CMD 🎶 Powered Up Bark Text-prompted Generative Audio Model
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
[OBSOLETE] Extensions API for SillyTavern.
LostRuins / koboldcpp
Forked from ggml-org/llama.cppRun GGUF models easily with a KoboldAI UI. One File. Zero Install.
Free Download Manager Add-On. Provides support for downloading videos from various sites.
sterlind / GPTQ-for-LLaMa
Forked from qwopqwop200/GPTQ-for-LLaMa4 bits quantization of LLaMa using GPTQ
wawawario2 / long_term_memory
Forked from oobabooga/textgenA gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.
An extension to Oobabooga to add a simple memory function for chat
0cc4m / GPTQ-for-LLaMa
Forked from qwopqwop200/GPTQ-for-LLaMa4 bits quantization of LLMs using GPTQ