Skip to content

QKV-Core/QKV-Core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

QKV Core: Breaking the 4GB VRAM Barrier πŸš€

Run modern 7B LLMs (like Qwen 2.5, Llama 3) on legacy 4GB GPUs (GTX 1050/1650) without crashing.

License Python Status Medium

πŸ“° In the Media & Articles

Read the engineering story behind QKV Core:

🧐 The Problem: It's Not Size, It's Fragmentation

Millions of developers are stuck on "GPU Poor" hardware like the NVIDIA GTX 1050 (4GB). When you try to load a standard quantized 7B model (e.g., Q4_K_M ~4.3GB or even smaller), you often hit the "OOM Cliff":

πŸš€ Features

  • Transformer Architecture: Full implementation of GPT-style transformer models
  • Training & Fine-tuning: Support for full training, incremental training, and fine-tuning
  • Parameter-Efficient Methods: LoRA and QLoRA for efficient fine-tuning
  • RLHF & DPO: Reinforcement Learning from Human Feedback and Direct Preference Optimization
  • Model Formats: Support for PyTorch (.pt) and GGUF formats
  • Hugging Face Integration: Download and convert models from Hugging Face Hub
  • Web UI: Comprehensive Gradio-based interface for all operations
  • CLI Interface: Command-line tools for training and inference
  • Research Features: Implementation of cutting-edge techniques (FlashAttention, Mamba SSM, etc.)

πŸ“¦ Installation

Prerequisites

  • Python 3.10+ (3.10, 3.11, or 3.12 recommended)
  • PyTorch 2.0+
  • CUDA Toolkit (optional, for GPU acceleration)

Quick Install

# Clone the repository
git clone https://github.com/QKV-Core/QKV-Core.git
cd QKV-Core

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Install GGUF support (optional, for GGUF models)
# See GGUF_INSTALL.md for platform-specific instructions

🎯 Quick Start

Web Interface

python launch_web_ui.py

Then open your browser to http://localhost:7861

Command Line Interface

# Train a tokenizer
python cli/run.py train-tokenizer --corpus data/sample_corpus.txt --output tokenizer/my_tokenizer.pkl

# Train a model
python cli/run.py train --data data/sample_corpus.txt --tokenizer tokenizer/my_tokenizer.pkl

# Chat with a model
python debug_chat.py

πŸ“š Documentation

πŸ—οΈ Project Structure

QKV-Core/
β”œβ”€β”€ core/              # Core transformer implementation
β”œβ”€β”€ models/            # Inference engines
β”œβ”€β”€ training/          # Training implementations
β”œβ”€β”€ web_ui/            # Gradio web interface
β”œβ”€β”€ cli/               # Command-line interface
β”œβ”€β”€ utils/             # Utility modules
└── docs/              # Documentation

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

πŸ“„ License

See LICENSE file for details.

πŸ™ Acknowledgments

Built on the fundamental Query-Key-Value attention mechanism that powers transformer architectures. QKV Core brings production-grade AI capabilities to your fingertips.


QKV Core - Where Query, Key, and Value Create Intelligence πŸš€

Running on 4GB Pascal GPU (Pascal 4GB BitNet flow) βš™οΈ

This project includes features and helper scripts to support running a hybrid BitNet model on limited GPUs (4GB Pascal-class). Key options:

  • Use the model factory to create a smaller model for development: create_transformer_model(config, mode='small').
  • Enable accelerate offload by setting env var USE_ACCELERATE=1 and optionally use bitsandbytes quantization with USE_BNB=1.
  • If you encounter fragmentation OOM, try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in the environment.

Quick example (POSIX shell):

# Optional: install offload/quantization libraries
pip install accelerate bitsandbytes

# Run the BitNet hybrid runner with accelerate offload enabled
USE_ACCELERATE=1 USE_BNB=0 python -m qkv_core.run_bitnet_hybrid

# For development / testing on a machine without GPU (fast fallback)
python -m qkv_core.run_bitnet_hybrid

Notes:

  • Offload/dispatch behavior depends on accelerate version and available system resources.
  • bitsandbytes is Linux-focused and may not be available on Windows; the code will skip quantization if not available.


Built with ❀️ for the Open Source AI Community by Hüseyin Kama

About

"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages