🦄 Unicorn Amanuensis

Professional AI Transcription Suite with Hardware Optimization

Free your GPU for what matters most 🚀

Features • Quick Start • Hardware Support • API • Why Unicorn Amanuensis?

🎨 Beautiful Themes

Light • Dark • Magic Unicorn Themes

🎯 What This Delivers

This intelligently leverages ALL available hardware:

🎮 Intel iGPU → Run Whisper with native SYCL acceleration (11.2x realtime, 65% less power!)
🔥 whisper.cpp Intel → New! Native Intel iGPU implementation with SYCL + MKL optimization
🚀 NVIDIA GPU → Optional high-performance mode when GPU is available
💎 AMD NPU → Utilize Ryzen AI for power-efficient transcription (220x speedup!)
💪 CPU → Universal fallback with optimized performance

🚀 NPU Performance

Whisper Base: 220x speedup (16.2s for 1 hour audio)
Whisper Medium: 125x speedup (28.8s for 1 hour audio)
Whisper Large: 67x speedup (54s for 1 hour audio)
Power: 5-15W (vs 45-125W CPU/GPU)
Custom MLIR-AIE2 kernels for AMD Phoenix NPU

Real-World Impact

Traditional Setup:                  With Unicorn Amanuensis:
┌─────────────────┐                ┌─────────────────┐
│   NVIDIA GPU    │                │   NVIDIA GPU    │
├─────────────────┤                ├─────────────────┤
│ LLM:       22GB │                │ LLM:       30GB │ ← More context!
│ Whisper:    8GB │                │ Free:       2GB │
│ Free:       2GB │                └─────────────────┘
└─────────────────┘                ┌─────────────────┐
                                   │   Intel iGPU    │
                                   ├─────────────────┤
                                   │ Whisper:    3GB │ ← Offloaded!
                                   └─────────────────┘

✨ Features

Unicorn Amanuensis is a professional transcription service powered by WhisperX, offering state-of-the-art speech recognition with advanced features like word-level timestamps and speaker diarization. Designed for both API integration and standalone use, it provides OpenAI-compatible endpoints for seamless integration with existing applications.

🎵 Professional Transcription

Whisper Large v3 with all the bells and whistles
Intel iGPU SYCL - Native GPU acceleration (11.2x realtime)
whisper.cpp Integration - Direct C++ implementation for maximum performance
Speaker Diarization - Know who said what
Word-Level Timestamps - Perfect sync for subtitles
100+ Languages - Global language support
VAD Integration - Smart voice activity detection

🔧 Hardware Optimization

Intel iGPU SYCL - Native C++ implementation with MKL optimization
whisper.cpp Integration - Direct Intel GPU access via Level Zero API
Auto-Detection - Automatically finds and uses best available hardware
Manual Selection - Choose which hardware to use via simple script
Hot-Swapping - Switch between hardware without restarting
Quantization - INT8/INT4 models for faster inference

🌐 Enterprise Ready

OpenAI-Compatible API - Drop-in replacement at /v1/audio/transcriptions
Batch Processing - Handle multiple files efficiently
Queue Management - Built-in job queue with status tracking
Real-Time Streaming - Live transcription support
Docker Deployment - One-command deployment

🎨 Modern Web Interface

Professional UI - Clean, modern design with theme support
Dark/Light/Unicorn Themes - Match your style
Real-Time Progress - Visual feedback during processing
Audio Waveform - See what you're transcribing
Export Options - JSON, SRT, VTT, TXT formats

🎯 Why Unicorn Amanuensis?

Save GPU Memory

Free 6-10GB VRAM for your AI models
Run larger LLMs with longer context
Enable multi-model pipelines

Reduce Costs

No need for expensive GPU upgrades
Utilize existing iGPU/NPU hardware
Lower power consumption

Increase Flexibility

Run transcription alongside other AI workloads
Scale horizontally across different hardware
Deploy on diverse infrastructure

Production Ready

Battle-tested in enterprise environments
Used by Magic Unicorn's UC-1 Pro platform
Handles millions of minutes monthly

🚀 Quick Start

Option 1: Docker (Recommended)

# Clone the repository
git clone https://github.com/Unicorn-Commander/Unicorn-Amanuensis.git
cd Unicorn-Amanuensis

# Auto-detect and run on best available hardware
docker-compose up

# Or choose specific hardware
docker-compose --profile igpu up    # Intel iGPU
docker-compose --profile cuda up    # NVIDIA GPU
docker-compose --profile npu up     # AMD NPU
docker-compose --profile cpu up     # CPU only

Option 2: Select Hardware Manually

# Interactive hardware selection
./select-gpu.sh

# This will:
# 1. Detect available hardware
# 2. Let you choose which to use
# 3. Start the optimized container

Option 3: Bare Metal

# Install dependencies
pip install -r requirements.txt

# For Intel iGPU optimization
pip install openvino-toolkit

# Run with auto-detection
python whisperx/server.py

# Or specify hardware
WHISPER_DEVICE=igpu python whisperx/server.py

Access the service at:

Web Interface: http://localhost:9000
API Endpoint: http://localhost:9000/v1/audio/transcriptions
API Documentation: http://localhost:9000/docs

🖥️ Hardware Support

Intel iGPU (Arc, Iris Xe, UHD)

3-5x faster than CPU with OpenVINO optimization
INT8 quantization for maximum speed
Supports Arc A-series, Iris Xe, UHD 600+
~3GB memory usage for Large v3

NVIDIA GPU (RTX, Tesla, A100)

Fastest performance with CUDA acceleration
FP16/INT8 optimization
Batch processing support
6-10GB VRAM for Large v3

AMD NPU (Ryzen AI)

Power efficient with 16 TOPS performance
ONNX Runtime optimization
Perfect for laptops
Coming soon: INT4 quantization

CPU (Universal)

Works everywhere
AVX2/AVX512 optimization
Multi-threading support
~8-16GB RAM for Large v3

📡 API Usage

OpenAI-Compatible Endpoint

import requests

# Works with any OpenAI client
response = requests.post(
    "http://localhost:9000/v1/audio/transcriptions",
    files={"file": open("audio.mp3", "rb")},
    data={
        "model": "whisper-1",
        "response_format": "json",
        "timestamp_granularities": ["word"],
        "diarize": True
    }
)

print(response.json())

Response with Speaker Diarization

{
  "text": "Full transcription here...",
  "segments": [
    {
      "speaker": "SPEAKER_01",
      "text": "Hello, how can I help you today?",
      "start": 0.0,
      "end": 2.5,
      "words": [
        {"word": "Hello,", "start": 0.0, "end": 0.5},
        {"word": "how", "start": 0.6, "end": 0.8},
        {"word": "can", "start": 0.9, "end": 1.1},
        {"word": "I", "start": 1.2, "end": 1.3},
        {"word": "help", "start": 1.4, "end": 1.7},
        {"word": "you", "start": 1.8, "end": 2.0},
        {"word": "today?", "start": 2.1, "end": 2.5}
      ]
    }
  ]
}

📊 Performance Benchmarks

Hardware	Model	Speed (RTF)*	Memory	Power	Notes
AMD Phoenix NPU	Base	0.0045x	1GB	10W	220x speedup! ✨
AMD Phoenix NPU	Medium	0.008x	2GB	12W	125x speedup ✨
AMD Phoenix NPU	Large	0.015x	3GB	15W	67x speedup ✨
Intel Arc A770	Large v3	0.15x	3GB	35W	OpenVINO
Intel Iris Xe	Large v3	0.25x	3GB	15W	OpenVINO
NVIDIA RTX 4090	Large v3	0.05x	8GB	100W	CUDA
Intel i7-13700K	Large v3	0.80x	16GB	65W	CPU only

*RTF = Real-Time Factor (lower is better, 0.5 = 2x faster than real-time) ✨ Custom MLIR-AIE2 kernels with AMD Phoenix NPU

🔧 Configuration

Environment Variables

# Model Configuration
WHISPER_MODEL=large-v3        # Model size (tiny, base, small, medium, large-v3)
WHISPER_DEVICE=auto           # Device (auto, cuda, igpu, npu, cpu)
WHISPER_BATCH_SIZE=16         # Batch size for processing
WHISPER_COMPUTE_TYPE=int8     # Precision (fp32, fp16, int8)

# API Configuration  
API_PORT=9000                 # API server port
API_HOST=0.0.0.0             # API host binding
MAX_WORKERS=4                 # Concurrent workers

# Feature Flags
ENABLE_DIARIZATION=true       # Speaker diarization
ENABLE_VAD=true              # Voice activity detection
ENABLE_WORD_TIMESTAMPS=true   # Word-level timing

📊 Model Selection

Model	Size	Accuracy	Speed	Memory	NPU Speed**	Best For
`tiny`	74M	Good	Fastest	1GB	N/A	Quick drafts, real-time
`base`	139M	Better	Fast	1GB	220x ⚡	Balanced performance
`small`	483M	Great	Balanced	2GB	N/A	Daily use
`medium`	1.5GB	Excellent	Moderate	5GB	125x ⚡	Professional work
`large-v3`	3GB	Best	Slower	10GB	67x ⚡	Maximum accuracy

With AMD Phoenix NPU acceleration (custom MLIR-AIE2 kernels)

🌐 Web Interface

Access the professional web interface at http://localhost:9000

Features:

Modern UI with Dark/Light/Unicorn themes
Drag-and-drop file upload with progress tracking
Real-time transcription with live updates
Speaker labels with color-coded identification
Export formats: TXT, SRT, VTT, JSON with timestamps
Audio waveform visualization
Search & highlight within transcripts

🔌 Integration Examples

UC-1 Pro Integration

Unicorn Amanuensis is the official STT engine for UC-1 Pro:

# In UC-1 Pro docker-compose.yml
services:
  unicorn-amanuensis:
    image: unicorncommander/unicorn-amanuensis:igpu
    ports:
      - "9000:9000"
    environment:
      - WHISPER_MODEL=large-v3
      - WHISPER_DEVICE=igpu  # Frees up RTX 5090 for LLM

Open-WebUI Integration

# Add to Open-WebUI .env
AUDIO_STT_ENGINE=openai
AUDIO_STT_OPENAI_API_KEY=dummy-key
AUDIO_STT_OPENAI_API_BASE_URL=http://localhost:9000/v1
AUDIO_STT_MODEL=whisper-1

Python Client Example

from openai import OpenAI

# Initialize client
client = OpenAI(
    api_key="dummy",
    base_url="http://localhost:9000/v1"
)

# Transcribe with speaker diarization
audio_file = open("meeting.mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="verbose_json",
    timestamp_granularities=["word", "segment"],
    parameters={"diarize": True}
)

# Process results
for segment in transcript.segments:
    print(f"[{segment.speaker}]: {segment.text}")

🚀 Docker Images

Pre-built images optimized for each hardware type:

# Intel iGPU (recommended for most users)
docker pull unicorncommander/unicorn-amanuensis:igpu

# NVIDIA GPU
docker pull unicorncommander/unicorn-amanuensis:cuda

# AMD NPU
docker pull unicorncommander/unicorn-amanuensis:npu

# CPU
docker pull unicorncommander/unicorn-amanuensis:cpu

# Latest auto-detect
docker pull unicorncommander/unicorn-amanuensis:latest

🗺️ Roadmap

🤝 Contributing

We welcome contributions! Areas we're especially interested in:

Hardware Optimization: New accelerator support
Model Quantization: Faster inference techniques
Language Support: Improved accuracy for specific languages
UI Enhancements: Better visualization and UX
Integration Examples: Connecting with other services

See CONTRIBUTING.md for guidelines.

📝 License

MIT License - See LICENSE file for details.

🙏 Acknowledgments

OpenAI Whisper team for the incredible models
Intel OpenVINO team for iGPU optimization tools
WhisperX team for enhanced features
SpeechBrain team for commercial-friendly speaker diarization
The Magic Unicorn community for testing and feedback

🔗 Part of the Unicorn Ecosystem

UC-1 Pro - Enterprise AI Platform
Unicorn Orator - Professional TTS
Center-Deep - AI-Powered Search
Kokoro TTS - Lightweight TTS

🌟 Star History

Built with 💜 by Magic Unicorn
Unconventional Technology & Stuff Inc.

🦄 Free Your GPU • Transcribe Everything • Deploy Anywhere 🦄

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
docs/images		docs/images
hardware-detect		hardware-detect
npu-models		npu-models
npu		npu
scripts		scripts
whisper-cpp-igpu		whisper-cpp-igpu
whisperx		whisperx
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
AMD_NPU_SETUP.md		AMD_NPU_SETUP.md
ARCHITECTURE.md		ARCHITECTURE.md
BATCH30_EXECUTIVE_SUMMARY.md		BATCH30_EXECUTIVE_SUMMARY.md
BATCH_NPU_INTEGRATION_COMPLETE.md		BATCH_NPU_INTEGRATION_COMPLETE.md
BATCH_PROCESSOR_INTEGRATION_REPORT.md		BATCH_PROCESSOR_INTEGRATION_REPORT.md
CLAUDE.md		CLAUDE.md
Dockerfile.comprehensive		Dockerfile.comprehensive
Dockerfile.igpu		Dockerfile.igpu
Dockerfile.igpu-final		Dockerfile.igpu-final
Dockerfile.igpu-int8		Dockerfile.igpu-int8
Dockerfile.igpu-openvino		Dockerfile.igpu-openvino
Dockerfile.lightweight		Dockerfile.lightweight
Dockerfile.production		Dockerfile.production
Dockerfile.stt-igpu		Dockerfile.stt-igpu
Dockerfile.sycl-only		Dockerfile.sycl-only
Dockerfile.uc1-production		Dockerfile.uc1-production
FINAL_STATUS_AND_PATH_FORWARD.md		FINAL_STATUS_AND_PATH_FORWARD.md
FINAL_VERIFICATION_REPORT.txt		FINAL_VERIFICATION_REPORT.txt
GUI_BEFORE_AFTER_20251101.txt		GUI_BEFORE_AFTER_20251101.txt
GUI_UPDATE_SUMMARY_20251101.txt		GUI_UPDATE_SUMMARY_20251101.txt
HARDWARE_ACCELERATION_GUIDE.md		HARDWARE_ACCELERATION_GUIDE.md
IGPU_MASTER_CHECKLIST.md		IGPU_MASTER_CHECKLIST.md
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
INSTALL-NPU.md		INSTALL-NPU.md
INTEGRATION_SUMMARY.md		INTEGRATION_SUMMARY.md
MLIR_AIE_XCLBIN_COMPILATION_PIPELINE.md		MLIR_AIE_XCLBIN_COMPILATION_PIPELINE.md
MLIR_COMPILATION_BLOCKERS.md		MLIR_COMPILATION_BLOCKERS.md
MLIR_COMPILATION_REPORT.md		MLIR_COMPILATION_REPORT.md
MLIR_COMPILATION_SUCCESS.md		MLIR_COMPILATION_SUCCESS.md
MLIR_PIPELINE_DOCUMENTATION_INDEX.md		MLIR_PIPELINE_DOCUMENTATION_INDEX.md
MODELS.md		MODELS.md
NPU-MODEL-SETUP.md		NPU-MODEL-SETUP.md
NPU-SETUP.md		NPU-SETUP.md
NPU_ACCELERATION_PROGRESS.md		NPU_ACCELERATION_PROGRESS.md
NPU_CUSTOM_KERNEL_ROADMAP.md		NPU_CUSTOM_KERNEL_ROADMAP.md
NPU_FULL_INTEGRATION_PLAN.md		NPU_FULL_INTEGRATION_PLAN.md
NPU_INSTALLATION_GUIDE.md		NPU_INSTALLATION_GUIDE.md
NPU_INTEGRATION_COMPLETE_NOV1_2025.md		NPU_INTEGRATION_COMPLETE_NOV1_2025.md
NPU_IS_WORKING.md		NPU_IS_WORKING.md
NPU_MEL_INTEGRATION_FIX_REPORT.md		NPU_MEL_INTEGRATION_FIX_REPORT.md
NPU_OPTIMIZATION_STRATEGY.md		NPU_OPTIMIZATION_STRATEGY.md
NPU_RUNTIME_DOCUMENTATION.md		NPU_RUNTIME_DOCUMENTATION.md
NPU_SERVER_STATUS_OCT31.md		NPU_SERVER_STATUS_OCT31.md
PATH_TO_AWESOMENESS_OCT30.md		PATH_TO_AWESOMENESS_OCT30.md
PRODUCTION_ROADMAP.md		PRODUCTION_ROADMAP.md
QUICK-START-NPU-CONSOLIDATED.md		QUICK-START-NPU-CONSOLIDATED.md
QUICK-START-NPU.md		QUICK-START-NPU.md
QUICK_TEST_GUIDE.md		QUICK_TEST_GUIDE.md
README-NPU-INSTALL.md		README-NPU-INSTALL.md
README.md		README.md
READY_TO_TEST.md		READY_TO_TEST.md
SERVER_UPDATES_NOV1.md		SERVER_UPDATES_NOV1.md
SETUP_IGPU_ACCELERATION.md		SETUP_IGPU_ACCELERATION.md
START_NPU_TEST_SERVER.sh		START_NPU_TEST_SERVER.sh
START_SERVER_BATCH20.sh		START_SERVER_BATCH20.sh
TEST_INSTRUCTIONS.md		TEST_INSTRUCTIONS.md
UC1-PRO-DEPLOYMENT.md		UC1-PRO-DEPLOYMENT.md
Unicorn_Amanuensis_screenshot.png		Unicorn_Amanuensis_screenshot.png
WHISPERX_OPENVINO_SETUP.md		WHISPERX_OPENVINO_SETUP.md
WHISPER_ENCODER_ARCHITECTURE_ANALYSIS.md		WHISPER_ENCODER_ARCHITECTURE_ANALYSIS.md
WHISPER_MODEL_GUIDE.md		WHISPER_MODEL_GUIDE.md
XCLBIN_COMPILATION_QUICK_REFERENCE.md		XCLBIN_COMPILATION_QUICK_REFERENCE.md
XCLBIN_GENERATION_BLOCKER.md		XCLBIN_GENERATION_BLOCKER.md
XCLBIN_PIPELINE_SUMMARY.md		XCLBIN_PIPELINE_SUMMARY.md
XRT-NPU-INSTALLATION.md		XRT-NPU-INSTALLATION.md
XRT_API_BREAKTHROUGH_DOCS.md		XRT_API_BREAKTHROUGH_DOCS.md
benchmark-gpus.sh		benchmark-gpus.sh
conversion_base.log		conversion_base.log
conversion_large.log		conversion_large.log
convert_to_openvino.py		convert_to_openvino.py
docker-compose.hardware.yml		docker-compose.hardware.yml
docker-compose.uc1-pro.yml		docker-compose.uc1-pro.yml
docker-compose.whisperx.yml		docker-compose.whisperx.yml
docker-compose.yml		docker-compose.yml
download-npu-models.sh		download-npu-models.sh
download_onnx_models.sh		download_onnx_models.sh
install-bare-metal.sh		install-bare-metal.sh
install-docker-npu.sh		install-docker-npu.sh
install-npu-host.sh		install-npu-host.sh
install.sh		install.sh
npu_server_enhanced.py		npu_server_enhanced.py
publish-dockerhub.sh		publish-dockerhub.sh
quantize_base.log		quantize_base.log
quantize_simple.py		quantize_simple.py
quantize_to_int8.py		quantize_to_int8.py
quick_test_npu.sh		quick_test_npu.sh
rebuild-npu.sh		rebuild-npu.sh
requirements.txt		requirements.txt
select-gpu.sh		select-gpu.sh

Unicorn-Commander/Unicorn-Amanuensis

Folders and files

Latest commit

History

Repository files navigation