🔥 Inferno - Your Personal AI Infrastructure

Run any AI model locally with enterprise-grade performance and privacy

Inferno is a production-ready AI inference server that runs entirely on your hardware. Think of it as your private ChatGPT that works offline, supports any model format, and gives you complete control over your AI infrastructure.

🎯 Why Inferno?

🔒 Privacy First

100% Local: All processing happens on your hardware
No Cloud Dependency: Works completely offline
Your Data Stays Yours: Zero telemetry or external data transmission

🚀 Universal Model Support

GGUF Models: Native support for Llama, Mistral, CodeLlama, and more
ONNX Models: Run models from PyTorch, TensorFlow, scikit-learn
Format Conversion: Convert between GGUF ↔ ONNX ↔ PyTorch ↔ SafeTensors
Auto-Optimization: Automatic quantization and hardware optimization

⚡ Enterprise Performance

GPU Acceleration: Metal (Apple Silicon, 13x speedup ✅), NVIDIA, AMD, Intel support
Smart Caching: Remember previous responses for instant results
Batch Processing: Handle thousands of requests efficiently
Load Balancing: Distribute work across multiple models/GPUs

🔧 Developer Friendly

OpenAI-Compatible API: Drop-in replacement for ChatGPT API
REST & WebSocket: Standard APIs plus real-time streaming
Multiple Languages: Python, JavaScript, Rust, cURL examples
Docker Ready: One-command deployment
Smart CLI: Typo detection, helpful error messages, setup guidance

📦 Installation

Choose your preferred installation method:

🍎 macOS

Desktop App (NEW in v0.5.0) - Recommended for macOS users

Native macOS application with Metal GPU capabilities detection, optimized for Apple Silicon (M1/M2/M3/M4)

Visit Releases
Download Inferno.dmg (universal binary for Intel & Apple Silicon)
Open the DMG and drag Inferno to Applications
Launch from Applications folder

Features:

🎨 Native macOS UI with vibrancy effects
🔔 System tray integration with live metrics
⚡ Metal GPU acceleration with 13x speedup (Phases 2.1-2.3 ✅)
🍎 Apple Silicon optimization (M1/M2/M3/M4 detection)
🔄 Automatic model downloads and updates
📊 Real-time performance monitoring with GPU metrics
🔐 Built-in security and API key management
🧠 Neural Engine detection for AI workloads

Build from source:

# Clone and build
git clone https://github.com/ringo380/inferno.git
cd inferno
./scripts/build-desktop.sh --release --universal

# Development mode with hot reload
cd dashboard && npm run tauri dev

CLI Tools (for automation and scripting)

Homebrew

# Add tap and install
brew tap ringo380/tap
brew install inferno

# Or directly
brew install ringo380/tap/inferno

# Start as service
brew services start inferno

Quick Install Script

curl -sSL https://github.com/ringo380/inferno/releases/latest/download/install-inferno.sh | bash

🐳 Docker

GitHub Container Registry

# Pull the latest image
docker pull ghcr.io/ringo380/inferno:latest

# Run with GPU support
docker run --gpus all -p 8080:8080 ghcr.io/ringo380/inferno:latest

# With custom models directory
docker run -v /path/to/models:/home/inferno/.inferno/models \
           -p 8080:8080 ghcr.io/ringo380/inferno:latest

Docker Compose

version: '3.8'
services:
  inferno:
    image: ghcr.io/ringo380/inferno:latest
    ports:
      - "8080:8080"
    volumes:
      - ./models:/home/inferno/.inferno/models
      - ./config:/home/inferno/.inferno/config
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

📦 Package Managers

Cargo (Rust)

# From crates.io
cargo install inferno

# From GitHub Packages
cargo install --registry github inferno

NPM (Desktop App)

# From GitHub Packages
npm install @ringo380/inferno-desktop

# From npm registry
npm install inferno-desktop

🐧 Linux

Binary Download

# Download for your architecture
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-x86_64
# or
wget https://github.com/ringo380/inferno/releases/latest/download/inferno-linux-aarch64

# Make executable and move to PATH
chmod +x inferno-linux-*
sudo mv inferno-linux-* /usr/local/bin/inferno

🪟 Windows

Binary Download

Download inferno-windows-x86_64.exe from Releases
Add to your PATH or run directly

Via Cargo

cargo install inferno

🔨 Build from Source

# Clone the repository
git clone https://github.com/ringo380/inferno.git
cd inferno

# Build release binary
cargo build --release

# Install globally (optional)
cargo install --path .

# Build desktop app (optional)
cd desktop-app && npm install && npm run build

⬆️ Upgrading

Automatic Updates (Built-in)

inferno upgrade check     # Check for updates
inferno upgrade install   # Install latest version

Package Managers

# Homebrew
brew upgrade inferno

# Docker
docker pull ghcr.io/ringo380/inferno:latest

# Cargo
cargo install inferno --force

# NPM
npm update @ringo380/inferno-desktop

Note: DMG and installer packages automatically detect existing installations and preserve your settings during upgrade.

🔐 Verify Installation

# Check version
inferno --version

# Verify GPU support
inferno gpu status

# Run health check
inferno doctor

🚀 Quick Start

# List available models
inferno models list

# Run inference
inferno run --model MODEL_NAME --prompt "Your prompt here"

# Start HTTP API server
inferno serve

# Launch terminal UI
inferno tui

# Launch desktop app (if installed from DMG)
open /Applications/Inferno.app

✨ Key Features

🧠 AI Backends

✅ Real GGUF Support: Full llama.cpp integration
✅ Real ONNX Support: Production ONNX Runtime with GPU acceleration
✅ Model Conversion: Real-time format conversion with optimization
✅ Quantization: Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, F16, F32 support

🏢 Enterprise Features

✅ Authentication: JWT tokens, API keys, role-based access
✅ Monitoring: Prometheus metrics, OpenTelemetry tracing
✅ Audit Logging: Encrypted logs with multi-channel alerting
✅ Batch Processing: Cron scheduling, retry logic, job dependencies
✅ Caching: Multi-tier caching with compression and persistence
✅ Load Balancing: Distribute inference across multiple backends

🔌 APIs & Integration

✅ OpenAI Compatible: Use existing ChatGPT client libraries
✅ REST API: Standard HTTP endpoints for all operations
✅ WebSocket: Real-time streaming and bidirectional communication
✅ CLI Interface: 40+ commands for all AI/ML operations
✅ Desktop App: Cross-platform Tauri application

🏗️ Architecture

Built with a modular, trait-based architecture supporting pluggable backends:

src/
├── main.rs           # CLI entry point
├── lib.rs            # Library exports
├── config.rs         # Configuration management
├── backends/         # AI model execution backends
├── cli/              # 40+ CLI command modules
├── api/              # HTTP/WebSocket APIs
├── batch/            # Batch processing system
├── models/           # Model discovery and metadata
└── [Enterprise]      # Advanced production features

🔧 Configuration

Create inferno.toml:

# Basic settings
models_dir = "/path/to/models"
log_level = "info"

[server]
bind_address = "0.0.0.0"
port = 8080

[backend_config]
gpu_enabled = true
context_size = 4096
batch_size = 64

[cache]
enabled = true
compression = "zstd"
max_size_gb = 10

🛠️ Development

See CLAUDE.md for comprehensive development documentation.

# Run tests
cargo test

# Format code
cargo fmt

# Run linter
cargo clippy

# Full verification
./verify.sh

📄 License

Licensed under either of:

Apache License, Version 2.0
MIT License

🔥 Ready to take control of your AI infrastructure? 🔥

Built with ❤️ by the open source community

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
.cargo		.cargo
.claude		.claude
.github		.github
auth		auth
benches		benches
dashboard		dashboard
data		data
desktop-app		desktop-app
docs		docs
examples		examples
helm		helm
homebrew		homebrew
icons		icons
infrastructure		infrastructure
k8s		k8s
monitoring		monitoring
performance		performance
plans		plans
scripts		scripts
src		src
test_models/test_models		test_models/test_models
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.inferno.toml		.inferno.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
.yamllint.yml		.yamllint.yml
AGENTS.md		AGENTS.md
API.md		API.md
AUDIT_AND_BATCH_GUIDE.md		AUDIT_AND_BATCH_GUIDE.md
AUDIT_FEATURES.md		AUDIT_FEATURES.md
AUDIT_FOLLOW_UPS.md		AUDIT_FOLLOW_UPS.md
AUDIT_RISK_ASSESSMENT.md		AUDIT_RISK_ASSESSMENT.md
BATCH_QUEUE_IMPLEMENTATION.md		BATCH_QUEUE_IMPLEMENTATION.md
CACHE_CONFIGURATION_GUIDE.md		CACHE_CONFIGURATION_GUIDE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COMPILATION_STATUS_REPORT.md		COMPILATION_STATUS_REPORT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
DASHBOARD_API_GUIDE.md		DASHBOARD_API_GUIDE.md
Dockerfile		Dockerfile
FEATURE_PRIORITIZATION_MATRIX.md		FEATURE_PRIORITIZATION_MATRIX.md
GETTING_STARTED.md		GETTING_STARTED.md
GITHUB_PROJECT_ORGANIZATION.md		GITHUB_PROJECT_ORGANIZATION.md
GITHUB_PROJECT_SETUP_INSTRUCTIONS.md		GITHUB_PROJECT_SETUP_INSTRUCTIONS.md
INFERNO_DEVELOPMENT_ROADMAP.md		INFERNO_DEVELOPMENT_ROADMAP.md
INSTALLATION_REQUIREMENTS.md		INSTALLATION_REQUIREMENTS.md
ISSUE_CREATION_INSTRUCTIONS.md		ISSUE_CREATION_INSTRUCTIONS.md
LICENSE		LICENSE
METAL_GPU_RESULTS.md		METAL_GPU_RESULTS.md
METAL_GPU_TESTING.md		METAL_GPU_TESTING.md
MODEL_CONVERSION_GUIDE.md		MODEL_CONVERSION_GUIDE.md
OPERATIONAL_GAPS_REMEDIATION.md		OPERATIONAL_GAPS_REMEDIATION.md
OPTIMIZATION_REPORT.md		OPTIMIZATION_REPORT.md
PACKAGE_SYSTEM_SUMMARY.md		PACKAGE_SYSTEM_SUMMARY.md
QUICK_TEST.md		QUICK_TEST.md
README.md		README.md
RELEASE_NOTES_v0.8.0.md		RELEASE_NOTES_v0.8.0.md
RESOURCE_ALLOCATION_DEPENDENCY_MAPPING.md		RESOURCE_ALLOCATION_DEPENDENCY_MAPPING.md
RISK_ASSESSMENT_MITIGATION_STRATEGIES.md		RISK_ASSESSMENT_MITIGATION_STRATEGIES.md
SECURITY.md		SECURITY.md
SUCCESS_METRICS_KPIS.md		SUCCESS_METRICS_KPIS.md
TECHNICAL_ARCHITECTURE_PLAN.md		TECHNICAL_ARCHITECTURE_PLAN.md
TESTING_STATUS.md		TESTING_STATUS.md
TROUBLESHOOTING_GUIDE.md		TROUBLESHOOTING_GUIDE.md
bootstrap.ps1		bootstrap.ps1
bootstrap.sh		bootstrap.sh
build.rs		build.rs
compilation-status.md		compilation-status.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.staging.yml		docker-compose.staging.yml
docker-compose.yml		docker-compose.yml
entitlements.plist		entitlements.plist
generate_simple_icons.py		generate_simple_icons.py
github_issues_comprehensive_plan.md		github_issues_comprehensive_plan.md
nginx.prod.conf		nginx.prod.conf
setup_github_labels.sh		setup_github_labels.sh
show-compilation-progress.sh		show-compilation-progress.sh
tauri.conf.json		tauri.conf.json
test_metal_gpu.sh		test_metal_gpu.sh
test_package_management.py		test_package_management.py
test_package_system.py		test_package_system.py
test_server_functionality.py		test_server_functionality.py
verify.sh		verify.sh
verify_functionality.sh		verify_functionality.sh
verify_hash_implementation.md		verify_hash_implementation.md

License

ringo380/inferno

Folders and files

Latest commit

History

Repository files navigation

🔥 Inferno - Your Personal AI Infrastructure

🎯 Why Inferno?

🔒 Privacy First

🚀 Universal Model Support

⚡ Enterprise Performance

🔧 Developer Friendly

📦 Installation

🍎 macOS

Desktop App (NEW in v0.5.0) - Recommended for macOS users

CLI Tools (for automation and scripting)

🐳 Docker

GitHub Container Registry

Docker Compose

📦 Package Managers

Cargo (Rust)

NPM (Desktop App)

🐧 Linux

Binary Download

🪟 Windows

Binary Download

Via Cargo

🔨 Build from Source

⬆️ Upgrading

Automatic Updates (Built-in)

Package Managers

🔐 Verify Installation

🚀 Quick Start

✨ Key Features

🧠 AI Backends

🏢 Enterprise Features

🔌 APIs & Integration

🏗️ Architecture

🔧 Configuration

🛠️ Development

📄 License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages