Stars
charlievieth / go-sqlite3
Forked from mattn/go-sqlite3sqlite3 driver for go using database/sql
A framework for efficient model inference with omni-modality models
Turso is an in-process SQL database, compatible with SQLite.
Execution Time Analysis, Reroute Enhancement, Remote Python Logs, For ComfyUI developers.
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
High-performance C++/CUDA SDK for running Audio2Emotion and Audio2Face inference with integrated post-processing.
ComfyUI-TBG-SAM3 A plug-and-play ComfyUI extension providing production-ready nodes for Meta’s SAM3 (Segment Anything Model 3) for text- or point-based segmentation, exhaustive mask generation, and…
TTS model capable of streaming conversational audio in realtime.
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Model Compression Toolbox for Large Language Models and Diffusion Models
Model Compression Toolbox for Large Language Models and Diffusion Models
SD.Next: All-in-one WebUI for AI generative image and video creation
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
CAFxX / httpcompression
Forked from nytimes/gziphandlerGo middleware to compress HTTP responses with Gzip, Deflate, Brotli, Zstandard, XZ/LZMA2, LZ4, and more..
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Qwen Image, Hunyuan Video, LTX Video and Flux.
You like pytorch? You like micrograd? You love tinygrad! ❤️
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
Specify a github or local repo, github pull request, arXiv or Sci-Hub paper, Youtube transcript or documentation URL on the web and scrape into a text file and clipboard for easier LLM ingestion
A fast inference library for running LLMs locally on modern consumer-class GPUs
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Generate ARKit expression from audio in realtime
A lightweight WebGL Render for LAM and LAM_Audio2Expression