All notable changes to this project are documented here.
--monitor-audioflag: plays RX and TX audio locally through the computer's speakers via sounddevice, so you can follow the conversation without a radio headset- HF radio-themed terminal UI: compact header + S-meter (RX) / power+SWR (TX) + mode/frequency panel, updated at 10 FPS from live wfweb meter data
- Callsign validation in
update_contacttool handler — rejects strings without a digit (e.g. a name accidentally passed as a callsign)
AudioConfigstripped to the two fields that are actually used:VAD_THRESHOLDandVAD_SILENCE_DURATION(server-VAD tuning for GPT-4o Realtime)RadioConfigHamlib PTT fields removed (PTT_ENABLED,PTT_RIG_MODEL, etc.)AgentConfigremoved entirely (LangChain legacy, no longer used)vad_thresholdremoved fromRadioUIconstructor (was accepted but never used)
VAD_SILENCE_DURATIONenv var name corrected in README (was incorrectly documented asSILENCE_DURATION_SEC)- ADIF log frequency race condition:
_on_statuscould fire beforeadif_loggerwas created; fixed by caching_current_freq_mhzand reading it at logger init time - S-meter, power, and SWR bars now use physical units from wfweb (dB rel. S9, watts, SWR ratio) instead of wrongly assuming 0–255 raw values
- GPT-4o Realtime function calling for contact tracking: model calls
update_contact(callsign, name, qth, notes, closing)incrementally during each QSO instead of emitting structured text CONTACT_TRACKINGprompt section inoperator_profiles_base.pyinstructs the model when and how to call the tool- Tool result submission via
function_call_outputconversation items on_speech_started/on_speech_stoppedcallbacks inRealtimeSession, wired to UI speech indicator
[QSO_LOG]text block parsing:_parse_and_update_qso_log,_strip_spoken_metadata,_end_qso_from_transcript,_QSO_CLOSING_PHRASES,QSO_LOGGING_FORMAT
- CQ timer now resets on QSO close, preventing an immediate re-CQ before the delay has elapsed
- Message truncation (
[:120]) removed from all transcript log sites
Complete replacement of the local audio pipeline with a GPT-4o Realtime + wfweb architecture:
| Before (Phase A) | After (Phase B) |
|---|---|
| sounddevice (PortAudio) | wfweb WebSocket binary frames |
| Silero VAD (local) | GPT-4o Realtime server_vad |
| Whisper MLX / faster-whisper | GPT-4o Realtime STT |
| LangChain agent + LLM | GPT-4o Realtime LLM |
| Piper TTS | GPT-4o Realtime TTS |
| Hamlib PTT | wfweb {"cmd":"setPTT"} |
audio/wfweb_client.py— WebSocket client speaking the wfweb browser protocol (RX audio, TX audio, PTT, status/meter messages)TxBuffer— streams 24 kHz GPT-4o Realtime audio to wfweb in real-time-paced 20 ms chunks after resampling to 48 kHzRealtimeSession— asyncio GPT-4o Realtime WebSocket session running in a background threadWfwebConfig—WFWEB_URL,WFWEB_CONNECT_TIMEOUT- SWL (Short Wave Listener) receive-only operator mode
- MONITORING operator mode (listens for direct calls, IDs every 5 minutes)
- sounddevice, PortAudio, Hamlib, Silero VAD, Whisper, Piper TTS, LangChain from the runtime path
Initial working version (as demonstrated in the companion video).
- Audio I/O: sounddevice (PortAudio)
- VAD: Silero VAD (local, PyTorch)
- STT: Whisper MLX (Apple Silicon) or faster-whisper
- LLM: LangChain multi-provider agent (Anthropic, OpenAI, Groq, Ollama, …)
- TTS: Piper TTS
- PTT: Hamlib via rigctld
- QSO state machine: CALLING_CQ → IN_QSO → QSO_ENDED
- ADIF contact logging
- Rich terminal UI
- CALLING_CQ and CONTESTING operator profiles
- Weather lookup injected into system prompt
- Station skip / hijack detection