Voice Loop for Codex

Voice Loop for Codex is a local, wake-word-driven voice interface for Codex. It listens through your microphone, transcribes speech locally with whisper.cpp, sends the text to a normal Codex app-server session, and reads Codex back with Kokoro.

What makes it different from a simple speech-to-text wrapper:

Local STT and TTS: whisper.cpp for transcription, Kokoro for speech.
Wake word by default: say "Jarvis" to start, then speak naturally during the orange-border cooldown.
Wake-word voice matching: the wake-word audio seeds a local speaker profile, so follow-up speech during cooldown has to sound like the same speaker before it can interrupt or reach Codex.
Barge-in: interrupt Codex while it is speaking; non-empty transcripts cancel the old response.
Echo control: WebRTC AEC uses the actual playback stream as a speaker reference so the assistant is less likely to hear itself.
Spoken-output-aware prompting: Codex is told that messages are dictated and replies are spoken.
Interruption recovery: interrupted turns include the spoken cutoff so Codex does not assume unheard text was conveyed.
Prototype local web display: large chat bubbles, wake-state border, streaming text, tool-call waiting indicators, and interrupted text fading. Microphone recording and response playback still run from the CLI.

Quick Start

This project is macOS Apple Silicon-first. It has only been tested by the author on an Apple MacBook Pro 14-Inch (2026, M5 Max).

git clone <repo-url>
cd voice-loop
./scripts/bootstrap.sh
./run.sh

The default run is equivalent to:

./run.sh --codex-new --effort medium --wake-word-mode openwakeword --wake-word "jarvis" --wake-word-cooldown-seconds 5

The first run may download model files from Hugging Face and openWakeWord. A HF_TOKEN is optional, but it can improve Hugging Face rate limits.

In the default openWakeWord mode, the detected wake-word audio also seeds SpeechBrain speaker matching. Follow-up speech during playback or the five-second cooldown can omit "Jarvis", but it must match that speaker profile before the client pauses playback, transcribes, or sends anything to Codex.

Prerequisites

Bootstrap installs or verifies the local developer dependencies it can manage:

Homebrew
Python 3.12
CMake, Git, Make, pkg-config
PortAudio and espeak-ng
whisper.cpp, built from source under third_party/
Whisper base.en and Silero VAD models

You must also have the Codex CLI installed and authenticated. The runtime talks to Codex with:

codex app-server --listen stdio://

Check your setup at any time:

./scripts/doctor.sh

For local development and tests, bootstrap with ./scripts/bootstrap.sh --dev.

Common Commands

./run.sh --list-devices
./run.sh --self-test third_party/whisper.cpp/samples/jfk.wav
./run.sh --no-web-client
./run.sh --wake-word-mode transcript --wake-word "Codex"
./run.sh --wake-word-cooldown-seconds 0
./run.sh --no-speaker-match
./run.sh --no-codex-new
./run.sh --model gpt-5-codex --effort high
./run.sh --kokoro-speed 1.0
./run.sh --no-playback-alignment

Advanced flags and tuning notes live in docs/configuration.md.

Privacy and Model Behavior

Microphone audio is processed locally for VAD, wake-word detection, speaker matching, and transcription. Kokoro TTS also runs locally after model files are downloaded. The dictated transcript is sent to Codex because Codex is the assistant backend.

Generated state is intentionally ignored by Git:

.venv/
third_party/
.codex-voice/
.openwakeword/
.speechbrain/
generated .wav and transcript artifacts

Project Layout

src/codex_voice_loop/   Python package and CLI
models/                 Redistributable packaged wake-word models
scripts/                Bootstrap and doctor scripts
docs/                   Runtime configuration and troubleshooting
tests/                  Fast behavior tests
third_party/            Ignored upstream checkouts created by bootstrap

License

Voice Loop for Codex source code is released under the MIT License. The included models/jarvis.onnx wake-word model is a separate OpenWakeWord library asset licensed for personal and non-commercial use by default; commercial use requires an OpenWakeWord commercial wake-word license. See LICENSE and THIRD_PARTY_NOTICES.md.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github		.github
docs		docs
models		models
scripts		scripts
src/codex_voice_loop		src/codex_voice_loop
tests		tests
.gitignore		.gitignore
Brewfile		Brewfile
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Loop for Codex

Quick Start

Prerequisites

Common Commands

Privacy and Model Behavior

Project Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Loop for Codex

Quick Start

Prerequisites

Common Commands

Privacy and Model Behavior

Project Layout

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages