Skip to content

"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.

License

Notifications You must be signed in to change notification settings

QKV-Core/QKV-Core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

QKV Core: Breaking the 4GB VRAM Barrier πŸš€

Run modern 7B LLMs (like Qwen 2.5, Llama 3) on legacy 4GB GPUs (GTX 1050/1650) without crashing.

License Python Status Medium

πŸ“° In the Media & Articles

Read the engineering story behind QKV Core:

🧐 The Problem: It's Not Size, It's Fragmentation

Millions of developers are stuck on "GPU Poor" hardware like the NVIDIA GTX 1050 (4GB). When you try to load a standard quantized 7B model (e.g., Q4_K_M ~4.3GB or even smaller), you often hit the "OOM Cliff":

πŸš€ Features

  • Transformer Architecture: Full implementation of GPT-style transformer models
  • Training & Fine-tuning: Support for full training, incremental training, and fine-tuning
  • Parameter-Efficient Methods: LoRA and QLoRA for efficient fine-tuning
  • RLHF & DPO: Reinforcement Learning from Human Feedback and Direct Preference Optimization
  • Model Formats: Support for PyTorch (.pt) and GGUF formats
  • Hugging Face Integration: Download and convert models from Hugging Face Hub
  • Web UI: Comprehensive Gradio-based interface for all operations
  • CLI Interface: Command-line tools for training and inference
  • Research Features: Implementation of cutting-edge techniques (FlashAttention, Mamba SSM, etc.)

πŸ“¦ Installation

Prerequisites

  • Python 3.10+ (3.10, 3.11, or 3.12 recommended)
  • PyTorch 2.0+
  • CUDA Toolkit (optional, for GPU acceleration)

Quick Install

# Clone the repository
git clone https://github.com/QKV-Core/QKV-Core.git
cd QKV-Core

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Install GGUF support (optional, for GGUF models)
# See GGUF_INSTALL.md for platform-specific instructions

🎯 Quick Start

Web Interface

python launch_web_ui.py

Then open your browser to http://localhost:7861

Command Line Interface

# Train a tokenizer
python cli/run.py train-tokenizer --corpus data/sample_corpus.txt --output tokenizer/my_tokenizer.pkl

# Train a model
python cli/run.py train --data data/sample_corpus.txt --tokenizer tokenizer/my_tokenizer.pkl

# Chat with a model
python debug_chat.py

πŸ“š Documentation

πŸ—οΈ Project Structure

QKV-Core/
β”œβ”€β”€ core/              # Core transformer implementation
β”œβ”€β”€ models/            # Inference engines
β”œβ”€β”€ training/          # Training implementations
β”œβ”€β”€ web_ui/            # Gradio web interface
β”œβ”€β”€ cli/               # Command-line interface
β”œβ”€β”€ utils/             # Utility modules
└── docs/              # Documentation

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

πŸ“„ License

See LICENSE file for details.

πŸ™ Acknowledgments

Built on the fundamental Query-Key-Value attention mechanism that powers transformer architectures. QKV Core brings production-grade AI capabilities to your fingertips.


QKV Core - Where Query, Key, and Value Create Intelligence πŸš€



Built with ❀️ for the Open Source AI Community by Hüseyin Kama

About

"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages