Skip to content

Latest commit

 

History

History
103 lines (79 loc) · 4.65 KB

File metadata and controls

103 lines (79 loc) · 4.65 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AI-DX is an autonomous amateur radio operator that conducts QSOs (radio conversations) using the GPT-4o Realtime API for end-to-end audio processing (VAD, STT, LLM, TTS) and wfweb for radio audio I/O and PTT control. It runs on macOS Apple Silicon (M4 tested).

Build & Run Commands

# Install dependencies
uv sync

# Production (requires wfweb + radio hardware)
uv run python radio_operator.py

# Demo mode (mic + speakers, no radio hardware)
uv run python radio_operator.py --demo
uv run python radio_operator.py -d

# Suppress Rich UI, log to console instead
uv run python radio_operator.py --no-ui

# Play RX/TX audio locally (production only)
uv run python radio_operator.py --monitor-audio

There are no tests, linter, or CI configured. The test_tools/ directory contains manual testing utilities, not automated test suites.

Architecture

Entry point: radio_operator.py — contains RadioOperator, TxBuffer, AudioMonitor, MicCapture, QSO state machine, weather fetching, phonetic spelling, and the main worker loop.

Production pipeline:

wfweb WebSocket (RX audio, 48kHz)
  → RealtimeSession.push_audio()   [resample 48→24kHz]
  → GPT-4o Realtime API            [server VAD + STT + LLM + TTS]
  → TxBuffer                       [resample 24→48kHz, real-time pacing]
  → wfweb WebSocket (TX audio + PTT)

Demo pipeline (--demo):

MicCapture (default mic, 16kHz)
  → RealtimeSession.push_audio()   [resample 16→24kHz]
  → GPT-4o Realtime API
  → TxBuffer
  → AudioMonitor (local speaker playback, 24kHz)

Key classes in radio_operator.py:

  • TxBuffer — queues 24kHz PCM16 from GPT-4o Realtime, streams to wfweb with real-time pacing; calls _on_tx_start/_on_tx_end callbacks; handles demo mode (no wfweb) transparently
  • AudioMonitor — optional local playback via sounddevice; used for --monitor-audio (48kHz) and demo mode (24kHz)
  • MicCapture — demo mode only; captures default mic at 16kHz, gates on is_transmitting

Key modules:

  • ai/realtime_client.pyRealtimeSession: asyncio WebSocket client for GPT-4o Realtime API; runs event loop in background daemon thread; exposes sync push_audio(), send_text(), reconnect(), close()
  • audio/wfweb_client.pyWfwebClient: WebSocket client for wfweb browser protocol; handles RX audio frames, TX audio frames, PTT, and status/meter callbacks
  • core/config.pyAppConfig composed of AudioConfig, RadioConfig, WfwebConfig, RealtimeConfig; loaded from env / .env
  • core/operator_profiles.py + core/operator_profiles_base.py — system prompt templates per operator style (CALLING_CQ, CONTESTING, MONITORING, SWL)
  • core/band_utils.py — frequency → band name mapping
  • core/adif_logger.py — ADIF QSO log writer
  • ui/radio_ui.py — Rich terminal UI at 10 FPS; panels: header, PTT/TX-RX, S-meter/TX-meters, comms log, QSO bar

LLM integration: GPT-4o Realtime API via WebSocket (wss://api.openai.com/v1/realtime). Server-side VAD (server_vad mode) — no client-side VAD. Contact tracking uses GPT-4o function calling (update_contact tool). CQ calls injected as text via send_text() with ephemeral=True. reconnect() called between QSOs to clear conversation history.

QSO state machine (in RadioOperator): CALLING_CQIN_QSOQSO_ENDEDCALLING_CQ (or MONITORING/SWL variants). Contacts logged to ADIF via update_contact(closing=true) tool call.

Configuration

All configuration is via environment variables or .env file. Key settings:

OPENAI_API_KEY=sk-...              # Required
CALLSIGN=W1AW                      # Required
WFWEB_URL=wss://192.168.x.x:8080  # Required for production
YOUR_NAME=Hiram
LOCATION="Newington, CT"
ANTENNA="Dipole"
POWER=100W
TRANSCEIVER="IC-7300"
OPERATOR_STYLE=CALLING_CQ          # CALLING_CQ | CONTESTING | MONITORING | SWL
REALTIME_MODEL=gpt-realtime-1.5
REALTIME_VOICE=ash
VAD_THRESHOLD=0.5
VAD_SILENCE_DURATION=0.6
CQ_INTERVAL_SEC=30
LOG_LEVEL=INFO

See README.md for the complete reference.

Important Constraints

  • macOS tested (Apple Silicon M4); no platform-specific dependencies in current architecture
  • Requires Python >=3.10, <3.14
  • No local STT, TTS, or VAD — all handled server-side by GPT-4o Realtime
  • No Hamlib, no sounddevice in production — all audio I/O via wfweb WebSocket
  • radio_operator.py is large; most application logic lives there
  • .env contains API keys — never commit it
  • Demo mode logs to logs/demo_YYYYMMDD_HHMMSS.{log,adi} — never touches logs/contacts.adi