Highlights
- Pro
Stars
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Real-Time VLAs via Future-state-aware Asynchronous Inference.
DFlash: Block Diffusion for Flash Speculative Decoding
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Development repository for the Triton language and compiler
TokenSpeed is a speed-of-light LLM inference engine.
FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.
High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
All information and news with respect to Falcon-H1 series
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
Interactive 3D visualization of dense decoder-only LLM inference. Companion to the AI Inference Engineer 2026 course.
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation.
Rust home automation runtime for Genie: local device graph, deterministic actuation safety, audit logs, and AI-native home-control APIs.
Jetson Orin-tuned LLM inference runtime for gemma 4, qwen 3.5 — memory-first, power-aware, zero-allocation. C++17 + CUDA.
🧃 Token weight loss. Lean output compaction for terminal-heavy agent workflows. Works as a native CLI tool or as an extension to popular coding and agent frameworks.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
ai-hpc / jetson-esp-hosted
Forked from espressif/esp-hostedHosted Solution (Jetson Linux) with ESP32 (Wi-Fi + BT + BLE)
GeniePod Home V1 hardware: MVP testing build, wiring, BOM, and planned interface-board/enclosure docs.
🦞 Low-latency, limited-context AI harness for private on-device homes.
Install and run NemoClaw on NVIDIA Jetson Orin with a patched OpenShell cluster image and streamlined onboarding.
High-performance C++/CUDA GPU-accelerated STARK prover for Triton VM