Turn local LLMs and AI agents into real-time browser voice assistants.
Qantara lets you talk by voice to Ollama, local LLM servers, and local AI agents through your browser. It handles microphone capture, speech recognition, turn-taking, interruption, text-to-speech, and the live connection to whichever local backend you choose — all running on your local network with no cloud dependency for speech processing.
Version
0.2.8— MCP voice client + server.0.2.6was the first public release.
Demo media needed: the README is ready for a 30-second GIF showing Docker startup, browser setup, an Ollama conversation, and barge-in. See docs/DEMO_PLAN.md.
git clone https://github.com/nawaf1-art/Qantara.git
cd Qantara
docker compose upOpen http://localhost:8765. Use Demo to test the browser voice UI without a backend, or choose OpenAI-Compatible for Ollama, llama.cpp, LM Studio, Jan, vLLM, LiteLLM, and similar local /v1/chat/completions servers.
First Docker startup downloads and builds local speech dependencies, so expect roughly 5-10 minutes and 8-10 GB of disk on a fresh machine. Subsequent starts are much faster. For the full install path, see docs/QUICKSTART.md.
Most voice interfaces are push-to-talk wrappers. Qantara is built for full-duplex conversation:
- Always listening — continuous microphone input, even while the assistant is speaking
- Barge-in — interrupt the assistant mid-sentence, naturally
- Local-first — STT and TTS run on your machine, not in the cloud
- Backend-agnostic — works with Ollama, llama.cpp, vLLM, LM Studio, Jan, LiteLLM, any OpenAI-compatible local server, and optional local agent bridges such as OpenClaw
Qantara is a voice channel, not a replacement for the local LLM or agent runtime behind it.
| Area | Status | Notes |
|---|---|---|
| Browser microphone voice UI | Stable | Vanilla JS/WebAudio client, no build step |
| WebSocket PCM voice pipeline | Stable | PCM16 mono 16 kHz audio path |
| Local STT/TTS | Stable | faster-whisper STT; Piper/Kokoro provider paths |
| Barge-in / interruption | Stable | Playback cancel path and active-turn handling |
| OpenAI-compatible local backends | Stable | Recommended path for Ollama, llama.cpp, LM Studio, Jan, LiteLLM, vLLM |
| MCP client and MCP voice server | Experimental | New in 0.2.8; automated smoke coverage exists, real desktop client testing is still recommended |
| OpenClaw bridge | Advanced | Optional local-agent bridge, host setup required |
| Home Assistant / Wyoming | Experimental | LAN satellite path; validate in your own HA environment |
| Screenshot + voice multimodal | Planned | Not implemented yet |
See the complete feature matrix for status labels and limitations.
Qantara is for developers building:
- A local AI voice assistant for Ollama.
- A private voice interface for local LLMs and OpenAI-compatible servers.
- A browser voice gateway for AI agents and MCP-backed workflows.
- A voice layer for OpenClaw-style local agent systems.
- A home or lab AI assistant that stays on the LAN.
- A developer testbed for real-time voice-agent behavior, including barge-in.
See docs/USE_CASES.md for practical workflows.
Qantara is early and pre-1.0. It is not the right fit yet for production call centers, medical or emergency use, fully managed cloud hosting, non-technical users expecting a polished commercial app, or environments that require audited enterprise compliance.
Qantara ships with no telemetry, no analytics, and no outbound connections to Qantara-controlled servers. Audio frames, transcripts, and conversation history never leave the machines you configure. The gateway connects only to the backends you select and to the HuggingFace / model-download endpoints the first time you use an STT or TTS model. There is no account, no key, no phone-home.
Defaults reflect this: no analytics SDKs in the browser client, no Google Fonts or other external CDNs, /api/configure and /api/test-url refuse non-private URLs, Docker-compose binds to 127.0.0.1 by default. See SECURITY.md and docs/SUPPLY_CHAIN.md for the full trust boundary.
Two other shapes of project exist in this space:
- Speech-native models (OpenAI Realtime, Gemini Live, MiniCPM-o, Moshi) — these are the model; audio in, audio out, no separate STT/TTS. They replace the brain, not the transport. Qantara can use text interfaces today; direct speech-native audio adapters are planned for a later
0.3.xline. - Heavy frameworks (Pipecat, LiveKit Agents) — vendor-agnostic orchestration with dozens of provider integrations and WebRTC infrastructure. Powerful, but many days to wire up.
Qantara's niche is the middle: a real full-duplex voice stack for local LLMs and agents that you can read, run, and ship in an afternoon. One docker compose up, no cloud accounts, no build step.
| Qantara | Pipecat | LiveKit Agents | HA Voice | Ollama-voice scripts | |
|---|---|---|---|---|---|
| Full-duplex + barge-in | ✅ | ✅ | ✅ | ❌ | ❌ |
| Browser client included | ✅ | Partial | Partial | ✅ | ❌ |
| Local-first default | ✅ | Optional | Cloud-first | ✅ | ✅ |
| No JS build tooling | ✅ | n/a | n/a | n/a | n/a |
| Swap LLM backend | ✅ | ✅ | ✅ | Limited | ❌ |
| Works without GPU | ✅ | ✅ | ✅ | ✅ | ✅ |
| First conversation | ~10 min on first Docker run; seconds after setup | Hours–days | Hours–days | ~1 hour | Minutes |
| Core code to read | ~4.5k Python LOC + vanilla JS client | ~50k | Large | Ecosystem | ~500 |
Comparisons reflect common configurations as of 2026-04; each of these projects is actively evolving.
Measured on 2026-04-24 with scripts/bench_launch.py --arabic on Linux 6.17 / Python 3.12. These are local gateway and TTS timings; LLM response time depends on the selected backend and model.
| Metric | Median | p95 | Notes |
|---|---|---|---|
| Gateway barge-in cancel path | 0.09 ms | 0.11 ms | Loopback adapter; budget is < 100 ms |
Piper English TTS synthesis (lessac) |
1533 ms | 1541 ms | Short launch phrase, full synthesis |
Piper Arabic TTS synthesis (ar_JO-kareem-medium) |
1801 ms | 1832 ms | Short Arabic launch phrase, full synthesis |
See docs/BENCHMARKS.md for methodology and how to refresh these numbers.
docker compose upOpen http://localhost:8765 — the setup page will guide you through backend selection.
If port 8765 is in use: QANTARA_PORT=9765 docker compose up
If you want Docker to expose Qantara to your LAN instead of loopback only, set a strong local token too:
QANTARA_AUTH_TOKEN="$(openssl rand -hex 24)" \
QANTARA_DOCKER_BIND=0.0.0.0 \
docker compose upThen open http://<your-lan-ip>:8765 and enter that token on the setup page.
First-run note. The initial
docker compose updownloads the Ollama image, a ~2 GB LLM (qwen2.5:3b), and builds the Qantara image with Python/ML speech dependencies. Expect 5–10 minutes and roughly 8–10 GB of disk on the first run, plus extra temporary Docker build cache. Subsequent runs start in seconds.Docker supports Ollama and OpenAI-compatible backends out of the box. OpenClaw is an advanced optional bridge that requires the
openclawCLI on your host, so it is not available inside the container. Use the Manual install path only if you already run OpenClaw agents.
python3 -m venv .venv
./.venv/bin/pip install -r gateway/transport_spike/requirements.txt
make spike-run-venvThis installs the full local gateway runtime stack, including STT/TTS dependencies. Open http://localhost:8765 — choose your backend and start talking.
For LAN microphone testing from another device, run Qantara with HTTPS/WSS and bind it explicitly:
QANTARA_AUTH_TOKEN="$(openssl rand -hex 24)" \
QANTARA_SPIKE_HOST=0.0.0.0 \
QANTARA_SPIKE_PORT=8899 \
QANTARA_TLS_CERT=ops/certs/qantara-cert.pem \
QANTARA_TLS_KEY=ops/certs/qantara-key.pem \
make spike-run-venvOpen https://<your-lan-ip>:8899/spike and enter the token on the setup page if prompted. Browsers require HTTPS or localhost for microphone access.
When you open Qantara, the setup page auto-detects available backends:
- OpenAI-Compatible (recommended) — connects directly to any
/v1/chat/completionsserver. Covers Ollama, llama.cpp, vLLM, LiteLLM, Jan, LM Studio. Fastest path. - Ollama (bridge) — uses a session bridge process. Works but slower than the direct OpenAI path.
- OpenClaw (advanced, optional) — shown only when the host CLI and gateway are healthy. Use it when you already want Qantara to speak through existing OpenClaw agents.
- Any MCP server (advanced) — calls a configured MCP chat tool over stdio or streamable HTTP.
- Custom URL — point at any server implementing the Qantara session contract.
- Demo — no backend needed, test the voice interface.
After selecting a backend, Qantara shows a full-screen dark voice mode:
- Central glowing orb that responds to audio amplitude
- Ephemeral captions showing the conversation
- Minimal controls: mic, end call, settings, debug toggle
- Stats bar with latency and backend info
- All debug tools accessible behind a toggle
- STT: faster-whisper (local, CPU)
- TTS: Kokoro 82M, Piper, and Chatterbox provider paths
- Arabic TTS: Piper
ar_JO-kareem-mediumwith a faster 1.3x baseline for natural pacing - Audio-driven animated SVG avatar with amplitude-driven mouth motion, eye blink, and breathing
- Full-duplex (listen while speaking)
- Barge-in with immediate playback cancel
- VAD-based endpointing with auto-submit
- Multilingual assistant mode with language-aware voice routing
- Speaking-state hold to prevent flickering
- Playback debounce for smooth state transitions
- Multi-device mesh — run Qantara on multiple devices; the closest-mic node answers. See docs/MESH.md.
- Home Assistant — experimental Wyoming satellite path for HA Assist workflows. See docs/HOMEASSISTANT.md.
- OpenAI-compatible — direct
/v1/chat/completions, voice-optimized system prompt, conversation history, SSE streaming - MCP client — agent-style chat tool adapter over stdio or streamable HTTP
- Session HTTP — Qantara's own session contract (used by Ollama and optional OpenClaw bridges)
- Mock — synthetic responses for testing
mcp_server.py exposes Qantara browser voice control as MCP tools. A local MCP client can call voice_get_status, voice_speak, voice_interrupt, and voice_set_voice; Qantara still handles TTS and browser playback over its WebSocket path.
scripts/fetch_piper_voices.sh downloads the launch Piper voices for English, Arabic, Spanish, and French. The voice registry reports installed voices through /api/tts; the language catalog reports launch-language TTS availability through /api/languages.
- Abstract base classes for STT and TTS
- Add a new provider by implementing a single file
- Factory selects provider via
QANTARA_STT_PROVIDER/QANTARA_TTS_PROVIDER
- Browser setup page with auto-detection
- CLI entry point:
python cli.py --backend ollama - Config file:
qantara.yml - Docker Compose with Ollama included
Browser (mic + speaker)
│
├── WebSocket (PCM audio) ──▶ Qantara Gateway
│ ├── Voice Activity Detection
│ ├── STT (faster-whisper)
│ ├── Session Management
│ ├── TTS (Kokoro / Piper)
│ └── Adapter Layer
│ │
│ ┌────────────┼────────────┐
│ ▼ ▼ ▼
│ OpenAI-compat Optional Custom
│ (Ollama, OpenClaw Backend
│ llama.cpp, bridge
│ vLLM, etc.)
│
└── Dark Voice Mode ◀── streaming response + captions
qantara/
├── adapters/ # Backend adapter framework
│ ├── base.py # Abstract adapter interface
│ ├── factory.py # Adapter selection
│ ├── openai_compatible.py # Direct OpenAI-compat adapter
│ ├── session_gateway_http.py # Session contract adapter
│ └── mock_adapter.py # Test adapter
├── client/
│ ├── setup/ # Browser setup page
│ └── transport-spike/ # Voice conversation UI
├── gateway/
│ ├── transport_spike/ # Gateway server, STT, TTS
│ ├── ollama_session_backend/ # Ollama bridge
│ └── openclaw_session_backend/ # OpenClaw bridge
├── providers/ # STT/TTS provider plugins
│ ├── stt/faster_whisper.py
│ ├── tts/kokoro.py
│ └── tts/piper.py
├── identity/ # Avatar, voice, and mouth-motion schemas
├── cli.py # CLI launcher
├── config.py # Config file loader
├── Dockerfile # Docker image
├── docker-compose.yml # Full stack
└── qantara.example.yml # Example config
| Layer | Technology |
|---|---|
| Gateway | Python 3, aiohttp (async) |
| STT | faster-whisper / CTranslate2 |
| TTS | Kokoro 82M via the kokoro Python package, Piper, Chatterbox |
| Transport | WebSocket, PCM16 mono 16kHz/24kHz |
| Browser | Vanilla JS, WebAudio API, no frameworks |
| Docker | Python 3.12 slim + Ollama |
| Version | Status | Description |
|---|---|---|
| 0.1.2 | ✅ Done | Provider plugin system |
| 0.1.3 | ✅ Done | Kokoro TTS (783ms warm) |
| 0.1.4 | ✅ Done | Backend setup experience |
| 0.1.5 | ✅ Done | Docker one-command setup |
| 0.1.6 | ✅ Done | OpenAI-compatible adapter |
| 0.1.7 | ✅ Done | Enhanced setup page |
| 0.1.8 | ✅ Done | Dark conversation view |
| 0.1.9-pre | ✅ Done | Contributor onboarding |
| 0.2.1 | ✅ Done | [Tier 1] Interaction polish + interruption-safe barge-in |
| 0.2.2 | ✅ Done | [Tier 1] Multi-device mesh + Wyoming (Home Assistant) + mobile UX pass |
| 0.2.4 | ✅ Done | Multilingual assistant + directional + live conversation translator (EN/AR/ES/FR/JA) |
| 0.2.5 | ✅ Done | Chatterbox TTS (expressive voice) |
| 0.2.6 | ✅ Released | Public launch |
| 0.2.7 | ✅ Released | Post-launch hardening patch |
| 0.2.8 | ✅ Released | MCP voice client + server |
| 0.3.2 | Planned | Speech-native adapter (OpenAI Realtime, Gemini Live, MiniCPM-o) |
| 0.3.4 | Planned | Identity-aware sessions (voice fingerprinting) |
| 0.3.5 | Planned | Screenshot + voice multimodal |
| 0.3.x | Planned | Ambient announcements, hybrid routing, multi-participant rooms |
See ROADMAP.md for full details.
Qantara is a pre-1.0 public project. See CONTRIBUTING.md for how to file issues, propose features, and submit patches. Early contributions are welcome.
Agents and automated tooling — see AGENTS.md for coding conventions and patterns.
Common issues (ports, mic permissions, backend detection, TLS, slow first response) are covered in docs/TROUBLESHOOTING.md.
Start with the documentation map. The main public guides are:
- Installation and first run
- Configuration
- Architecture
- MCP bridge
- Developer onboarding
- Release checklist
- Publishing readiness audit
Qantara is designed to run on your local network, not the public internet.
- The browser setup page's URL probe (
/api/test-url) and backend configuration endpoint (/api/configure) restrict outbound URLs to private/loopback IPs — public URLs are rejected. - If you set
QANTARA_AUTH_TOKEN, it must be at least 24 characters. Browsers unlock Qantara through/api/auth/loginand an HttpOnly local cookie; API clients may useAuthorization: Bearer <token>. - Token auth protects
/ws,/api/configure,/api/translation_mode,/api/warmup,/api/test-url,/api/discovery/scan, backend discovery endpoints, and mesh status endpoints. - If you set
QANTARA_ADMIN_TOKEN,/api/admin/runtimerequiresAuthorization: Bearer <token>. If you leave it unset, that endpoint is disabled and returns404. - Selecting the Ollama bridge, or the advanced optional OpenClaw bridge, spawns a local bridge subprocess on a dynamically allocated port. The gateway trusts the bridge binary; run Qantara only on machines you control.
- Native runs bind to
127.0.0.1:8765by default. To expose a native run to your LAN, setQANTARA_SPIKE_HOST=0.0.0.0explicitly and consider running behind TLS (QANTARA_TLS_CERT/QANTARA_TLS_KEY). - Docker publishes
127.0.0.1:8765on the host by default even though the container listens on0.0.0.0. To publish on all host interfaces, setQANTARA_DOCKER_BIND=0.0.0.0. - Mesh and Wyoming bind to loopback by default. To make them reachable across your LAN, explicitly set
QANTARA_MESH_HOST=0.0.0.0orQANTARA_WYOMING_HOST=0.0.0.0and use only on a trusted LAN.
If you find a security issue, please use GitHub private vulnerability reporting rather than opening a public issue — see SECURITY.md.
Apache 2.0 — see LICENSE for details.