Run modern 7B LLMs (like Qwen 2.5, Llama 3) on legacy 4GB GPUs (GTX 1050/1650) without crashing.
Read the engineering story behind QKV Core:
- Part 1: Breaking the 4GB VRAM Barrier - How I ran 7B LLMs on a GTX 1050 without crashing.
- Part 2: Inside the Architecture - (Coming Soon)
Millions of developers are stuck on "GPU Poor" hardware like the NVIDIA GTX 1050 (4GB). When you try to load a standard quantized 7B model (e.g., Q4_K_M ~4.3GB or even smaller), you often hit the "OOM Cliff":
- Transformer Architecture: Full implementation of GPT-style transformer models
- Training & Fine-tuning: Support for full training, incremental training, and fine-tuning
- Parameter-Efficient Methods: LoRA and QLoRA for efficient fine-tuning
- RLHF & DPO: Reinforcement Learning from Human Feedback and Direct Preference Optimization
- Model Formats: Support for PyTorch (.pt) and GGUF formats
- Hugging Face Integration: Download and convert models from Hugging Face Hub
- Web UI: Comprehensive Gradio-based interface for all operations
- CLI Interface: Command-line tools for training and inference
- Research Features: Implementation of cutting-edge techniques (FlashAttention, Mamba SSM, etc.)
- Python 3.10+ (3.10, 3.11, or 3.12 recommended)
- PyTorch 2.0+
- CUDA Toolkit (optional, for GPU acceleration)
# Clone the repository
git clone https://github.com/QKV-Core/QKV-Core.git
cd QKV-Core
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Install GGUF support (optional, for GGUF models)
# See GGUF_INSTALL.md for platform-specific instructionspython launch_web_ui.pyThen open your browser to http://localhost:7861
# Train a tokenizer
python cli/run.py train-tokenizer --corpus data/sample_corpus.txt --output tokenizer/my_tokenizer.pkl
# Train a model
python cli/run.py train --data data/sample_corpus.txt --tokenizer tokenizer/my_tokenizer.pkl
# Chat with a model
python debug_chat.py- CONTRIBUTING.md: Comprehensive contribution guidelines
- GGUF_INSTALL.md: GGUF model installation guide
- docs/RESEARCH_IMPLEMENTATIONS.md: Research paper implementations
QKV-Core/
βββ core/ # Core transformer implementation
βββ models/ # Inference engines
βββ training/ # Training implementations
βββ web_ui/ # Gradio web interface
βββ cli/ # Command-line interface
βββ utils/ # Utility modules
βββ docs/ # Documentation
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
See LICENSE file for details.
Built on the fundamental Query-Key-Value attention mechanism that powers transformer architectures. QKV Core brings production-grade AI capabilities to your fingertips.
QKV Core - Where Query, Key, and Value Create Intelligence π