Skip to content

Jaskey15/voice-agents

Repository files navigation

Voice Agents

AI-powered voice agents for phone-based conversations using Twilio ConversationRelay, OpenRouter LLMs, and ElevenLabs text-to-speech.

Agents

  • Sales Rep (Gage) — Sales representative at Ceramic Coat Texas handling inbound calls about ceramic coating, PPF, window tinting, and paint correction. Fully configured with detailed service knowledge, pricing, and booking flows.
  • Booking Agent (Sarah) — Receptionist at Precision Auto Detailing handling appointment scheduling. Has tool calling definitions (check_availability, book_appointment) but tool execution is not yet fully implemented.

Each agent runs as an independent FastAPI server on its own port.

Architecture

Caller (phone)
    |
Twilio (ConversationRelay)
    |  STT: Twilio transcribes caller speech automatically
    |  TTS: ElevenLabs converts agent responses to speech
    |
WebSocket (bidirectional)
    |
FastAPI Server (your code)
    |  Receives transcribed text from caller
    |  Sends to LLM via OpenRouter
    |  Streams response tokens back to Twilio
    |
OpenRouter API
    LLM generates conversational response

Your server manages the conversation loop: it receives transcribed speech from Twilio via WebSocket, sends it to the LLM, and streams the response back. Twilio handles STT, and ElevenLabs handles TTS — your server never touches audio directly.

Tech Stack

  • Backend: Python 3.9+ / FastAPI / Uvicorn
  • Phone Integration: Twilio ConversationRelay (WebSocket-based real-time audio)
  • STT: Twilio (built-in with ConversationRelay)
  • TTS: ElevenLabs (via Twilio ConversationRelay)
  • LLM: OpenRouter (OpenAI-compatible API — supports GPT-4o-mini, Gemini, Claude, etc.)
  • Storage: SQLite (metadata) + JSON files (full transcripts)
  • Config: Pydantic Settings with .env file

Prerequisites

  1. Python 3.9+
  2. Twilio Accounttwilio.com with a voice-capable phone number
  3. OpenRouter API Keyopenrouter.ai
  4. ngrok (for development) — ngrok.com to expose your local server

Installation

# Clone the repository
git clone <your-repo-url>
cd voice-agents

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Configuration

Create a .env file in the project root:

# Twilio
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890

# OpenRouter
OPENROUTER_API_KEY=your_openrouter_api_key
OPENROUTER_MODEL=gpt-4o-mini

# Application
BASE_URL=https://your-ngrok-url.ngrok.io

Agent-specific settings (ports, voice, LLM params) can be overridden with environment variable prefixes SALES_REP_ and BOOKING_. See agents/sales_rep/config.py and agents/booking_agent/config.py for all options.

Running

1. Start ngrok

ngrok http 8001  # or 8002 for booking agent

Update BASE_URL in .env with the ngrok HTTPS URL.

2. Configure Twilio Webhooks

In the Twilio Console under your phone number's Voice Configuration:

  • A Call Comes In: Webhook, https://<ngrok-url>/voice/incoming, HTTP POST
  • Call Status Changes: Webhook, https://<ngrok-url>/voice/status, HTTP POST

3. Start an Agent

# Sales Rep (port 8001)
python -m agents.sales_rep.main

# Booking Agent (port 8002)
python -m agents.booking_agent.main

4. Call Your Twilio Number

The agent will greet the caller and handle the conversation.

Project Structure

voice-agents/
├── agents/
│   ├── sales_rep/
│   │   ├── main.py              # FastAPI app, webhooks, WebSocket handler
│   │   ├── config.py            # Agent-specific settings (port 8001)
│   │   ├── persona.py           # GagePersona class (LLM conversation)
│   │   └── prompts/
│   │       └── system_prompt.txt # Gage's full character + service knowledge
│   └── booking_agent/
│       ├── main.py              # FastAPI app, webhooks, WebSocket handler
│       ├── config.py            # Agent-specific settings (port 8002)
│       ├── persona.py           # BookingPersona class (with tool calling)
│       ├── tools.py             # Tool definitions (check_availability, book_appointment)
│       └── prompts/
│           └── system_prompt.txt # Sarah's receptionist persona
├── shared/
│   ├── base_persona.py          # Abstract base class for all personas
│   ├── llm_client.py            # LLM factory (OpenRouter via OpenAI SDK)
│   ├── twilio_utils.py          # TwiML generation for ConversationRelay
│   ├── session.py               # CallSession management + cleanup
│   ├── storage.py               # Transcript storage (SQLite + JSON)
│   └── database.py              # SQLite database management
├── tests/
│   └── test_shared/             # Tests for shared utilities
├── docs/
│   └── VOICE-AGENT-ARCHITECTURE.md  # Reference architecture notes
├── data/                        # Runtime data (created automatically)
│   ├── transcripts.db           # SQLite metadata index
│   └── transcripts/             # JSON transcripts organized by date
├── config.py                    # Global settings (Twilio, OpenRouter, base URL)
└── requirements.txt             # Python dependencies

Call Flow

  1. Caller dials your Twilio number
  2. Twilio sends a webhook to /voice/incoming
  3. Server creates the persona, generates a greeting, and returns TwiML
  4. TwiML connects the call to ConversationRelay with the ElevenLabs voice
  5. Twilio opens a WebSocket to /voice/relay
  6. Caller speaks — Twilio transcribes and sends text to the server
  7. Server sends transcript to the LLM (OpenRouter) and streams the response back
  8. Twilio/ElevenLabs converts the response to speech for the caller
  9. Steps 6-8 repeat until the call ends
  10. Server saves the full transcript to SQLite + JSON

API Endpoints

Each agent exposes the same set of endpoints:

Endpoint Method Description
/ GET Health check
/voice/incoming POST Twilio webhook — returns ConversationRelay TwiML
/voice/relay WebSocket Real-time conversation handling
/voice/status POST Twilio call status updates, triggers transcript save
/transcripts GET List transcripts (supports filtering)
/transcripts/{call_sid} GET Get a specific transcript
/transcripts/stats/summary GET Storage statistics

Testing

pytest              # Run all tests
pytest -v           # Verbose output

Environment Variables Reference

Variable Required Default Description
TWILIO_ACCOUNT_SID Yes Twilio Account SID
TWILIO_AUTH_TOKEN Yes Twilio Auth Token
TWILIO_PHONE_NUMBER Yes Your Twilio phone number
OPENROUTER_API_KEY Yes OpenRouter API key
OPENROUTER_MODEL Yes LLM model (e.g., gpt-4o-mini)
BASE_URL Yes Public URL for Twilio webhooks (ngrok in dev)
OPENROUTER_HTTP_REFERER No Attribution header for OpenRouter
OPENROUTER_X_TITLE No App title for OpenRouter
SALES_REP_PORT No 8001 Sales rep server port
BOOKING_PORT No 8002 Booking agent server port

License

MIT

About

Multi-voice agent architecture with Elevenlabs and Twilio integration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages