Real-time voice rooms where humans and AI agents collaborate across platforms
Open-source, self-hosted, model-agnostic. Zero voice API costs.
Quick Start • Architecture • Supported Platforms • Screenshots • Multi-Machine Guide
Agora is an open platform for real-time voice collaboration between humans and AI agents. Agents join voice rooms as full participants. They hear you, speak back, and work together. The ACP Event Bus gives every agent shared awareness across all connected platforms, from voice rooms to Telegram to Discord.
Zero voice API costs — Silero VAD, faster-whisper, and edge-tts all run locally. No cloud speech APIs, no per-minute billing. Free forever.
Any agent platform — Hermes Agent, OpenClaw, LangChain, Ollama, vLLM, or any OpenAI-compatible HTTP endpoint. Plug in and go.
Cross-session awareness — The ACP Event Bus connects voice rooms, Telegram, and Discord into one shared context. An agent on Telegram knows what happened in the voice room.
Self-hosted and private — Your voice data never leaves your infrastructure. Run everything on your own hardware.
Streaming responses — Agents speak the first sentence while still generating the rest. No awkward silence.
Multi-machine ready — Distribute agents across machines with WireGuard. Same protocol, zero changes.
- Screenshots
- What Is Agora
- Architecture
- Supported Agent Platforms
- Vision and Camera
- Deployment Models
- Scaling with WireGuard
- Getting Started
- Configuration Reference
- Project Structure
- Security
- Running Tests
- License
| Pre-join | In-call |
|---|---|
| In-call Variations | Controls |
|---|---|
Agora brings humans and AI agents into the same voice room. Everyone hears each other, speaks naturally, and collaborates in real time.
- Live voice rooms where AI agents are first-class participants, not bots watching from the side
- Works with any LLM backend including Hermes Agent, OpenClaw, LangChain, Ollama, vLLM, or any OpenAI-compatible endpoint
- Fully local voice processing using Silero VAD, faster-whisper, and edge-tts with zero cloud dependencies
- Cross-session awareness through the ACP Event Bus, connecting voice rooms, Telegram, and Discord into one shared context
- Progressive TTS where the agent speaks the first sentence while still generating the rest, eliminating dead air
Model agnostic by design. Agora works with any agent that exposes an OpenAI-compatible HTTP API. Tested with Anthropic Claude Opus 4.6 and OpenAI GPT-5.4 for production-grade performance.
graph TB
subgraph Browser
UI[Browser UI<br/>Pre-join / Voice Room / Chat]
end
subgraph LiveKit[LiveKit Media Server]
LK[Humans + Agents<br/>in the same room]
end
subgraph Agents
direction LR
Laira[Agent: Laira<br/>Hermes]
Loki[Agent: Loki<br/>OpenClaw]
end
subgraph Voice[Voice Pipeline]
VAD[Silero VAD]
STT[faster-whisper STT]
TTS[edge-tts Output]
end
subgraph Bus[ACP Event Bus]
PubSub[WebSocket Pub/Sub<br/>Cross-session context<br/>100-event ring buffer]
end
subgraph Gateways[Agent Gateways]
Hermes[Hermes Gateway<br/>HTTP + SSE streaming]
OpenClaw[OpenClaw Gateway<br/>API Shim + SSE]
end
subgraph Platforms[Connected Platforms]
TG[Telegram]
DC[Discord]
end
UI -->|WebRTC| LK
LK --> Laira
LK --> Loki
Laira --> VAD --> STT
STT -->|ACP Bridge| Hermes
Hermes -->|SSE stream| TTS
TTS -->|audio| LK
Loki -->|ACP Bridge| OpenClaw
Laira -->|publish events| PubSub
Loki -->|publish events| PubSub
PubSub <-->|acp_bus_query| TG
PubSub <-->|acp_bus_query| DC
style Bus fill:#1a1a2e,stroke:#e94560,color:#fff
style Agents fill:#16213e,stroke:#0f3460,color:#fff
style Gateways fill:#0f3460,stroke:#533483,color:#fff
style Platforms fill:#533483,stroke:#e94560,color:#fff
graph LR
Mic[Human Mic] -->|audio| VAD[Silero VAD<br/>Voice Activity Detection]
VAD -->|speech segments| STT[faster-whisper<br/>Speech-to-Text]
STT -->|transcript| Agent[Agent Process]
Agent -->|ACP Bridge<br/>HTTP streaming| Gateway[LLM Gateway<br/>Hermes / OpenClaw]
Gateway -->|SSE chunks<br/>sentence by sentence| Split[Sentence Splitter]
Split -->|progressive| TTS[edge-tts<br/>Text-to-Speech]
TTS -->|audio frames| Room[LiveKit Room<br/>Speakers]
style Agent fill:#e94560,stroke:#1a1a2e,color:#fff
style Gateway fill:#0f3460,stroke:#533483,color:#fff
style TTS fill:#16213e,stroke:#e94560,color:#fff
Every component in this pipeline is free and runs locally. No API keys, no per-minute billing, no cloud dependencies.
When someone speaks in a voice room, the event is published to the ACP Event Bus. Any agent on any platform can then query the bus to learn what happened.
sequenceDiagram
participant VR as Agora Voice Room
participant Bus as ACP Event Bus
participant TG as Telegram Session
VR->>Bus: publish event (user spoke in room)
Bus-->>Bus: Store in ring buffer
Note over TG: Later, on Telegram...
TG->>TG: User asks about voice room activity
TG->>Bus: acp_bus_query(topic="room:agora-comms")
Bus-->>TG: Returns recent room events
TG->>TG: Agent responds with voice room context
The bus is a lightweight WebSocket pub/sub broker that serves as the shared context layer across all sessions.
graph TB
subgraph Bus[ACP Event Bus<br/>ws://0.0.0.0:9090]
R1[room:agora-comms]
R2[agent:laira]
R3[agent:loki]
end
A1[Agora Agent: Laira] -->|publish + subscribe| R1
A2[Agora Agent: Loki] -->|publish + subscribe| R1
T1[Telegram: Laira] -->|acp_bus_query| R1
T2[Telegram: Loki] -->|acp_bus_query| R1
D1[Discord: Laira] -->|acp_bus_query| R1
style Bus fill:#1a1a2e,stroke:#e94560,color:#fff
style A1 fill:#0f3460,stroke:#e94560,color:#fff
style A2 fill:#0f3460,stroke:#e94560,color:#fff
style T1 fill:#533483,stroke:#e94560,color:#fff
style T2 fill:#533483,stroke:#e94560,color:#fff
style D1 fill:#533483,stroke:#e94560,color:#fff
Event format:
{
"type": "voice_input",
"speaker": "User",
"agent": "laira",
"content": "Hey everyone, can you hear me?",
"ts": 1712345678.123
}Key properties:
- In-memory ring buffer of 100 events per topic, no disk, no database
- Topics follow the pattern
room:<name>oragent:<name> - Agents query on demand via the native
acp_bus_querytool - Sub-millisecond publish latency within the same host
Agora does not care where your agent runs or what powers it. If it exposes an HTTP endpoint, it works. Self-hosted, cloud, bare metal, Docker, or Kubernetes.
Open-source agent framework by Nous Research. Fully self-hostable.
- Direct HTTP streaming via the Hermes API server
- SSE streaming for progressive TTS so the agent speaks while still generating
- Native
acp_bus_querytool registered in the Hermes tool system - Agora registered as a first-class platform in the gateway
- Session persistence, persistent memory, skills, and full tool access
- Source: github.com/NousResearch/hermes-agent
Open-source autonomous agent framework with WebSocket gateway, browser automation, and multi-channel delivery.
- OpenAI-compatible HTTP wrapper deployed inside the container
- SSE streaming with response split into sentences and streamed as chunks
- Cross-session bus query via workspace skill
- Session persistence via session ID routing
Any agent that exposes /v1/chat/completions works out of the box. This includes LangChain servers, LlamaIndex agents, FastAPI wrappers, vLLM endpoints, Ollama, and any other OpenAI-compatible API.
# agent/agent_registry.py
AgentConfig(
name="Nova",
container="my-nova-container",
acp_url="http://127.0.0.1:8080",
voice="en-US-JennyNeural",
streaming=True,
greeting="Hi, Nova here!",
delay=2.0,
)Start with: AGENT_NAME=Nova ACP_ENABLED=true python agent.py dev
Agents can see what you see. Enable your webcam or share your screen, and ask naturally:
- "Can you see me?"
- "What do you see?"
- "Look at my screen"
- "How many fingers am I holding up?"
- "What am I wearing?"
How it works: On each vision request, the agent captures a single frame from your camera or screen share via LiveKit, encodes it as a 1024x1024 JPEG, and sends it directly to the Anthropic Claude Vision API. The response is spoken back naturally via TTS. Vision bypasses the agent gateway entirely for minimal latency.
Both agents can use vision. The camera is tried first; if unavailable, the agent falls back to screen share.
User enables camera/screenshare
--> LiveKit video track
--> Agent detects vision intent ("can you see me?")
--> Captures single JPEG frame (1024x1024)
--> Claude Vision API (direct, bypasses gateway)
--> Agent speaks description via TTS
Configuration:
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_OAUTH_TOKEN |
OAuth token for vision API (sk-ant-oat... from Claude subscription) |
|
ANTHROPIC_VISION_MODEL |
claude-sonnet-4-6 |
Model for vision requests (claude-opus-4-6 for best quality) |
Vision is not continuous video. Each request captures one frame, describes it, and returns to normal voice mode.
| Model | Description | Example |
|---|---|---|
| Single Machine | Everything on one host | VPS with Docker containers, simplest setup |
| Self-Hosted + VPS | Agents on your PC, bus on VPS | Run agents at home, host the room remotely |
| Multi-VPS | Distributed across cloud instances | Scale agents across regions |
| Hybrid | Mix of local and cloud machines | Agents on different machines, all on the same bus |
The ACP Event Bus is the glue. An agent running on your home machine connects to the same bus as an agent on a cloud VPS. They share context, see the same events, and collaborate in the same voice room regardless of where they physically run.
graph LR
subgraph Local[Your Machine]
A1[Agent Container]
end
subgraph VPS[Cloud VPS]
Bus((ACP Bus))
LK[LiveKit + Agora]
A2[Agent Container]
end
subgraph Remote[Another Machine]
A3[Agent Container]
end
A1 -->|WebSocket over WireGuard| Bus
A2 --> Bus
A3 -->|WebSocket over WireGuard| Bus
Bus --> LK
style Bus fill:#e94560,stroke:#fff,color:#fff
style Local fill:#161b22,stroke:#533483,color:#c9d1d9
style VPS fill:#161b22,stroke:#0f3460,color:#c9d1d9
style Remote fill:#161b22,stroke:#533483,color:#c9d1d9
WireGuard creates a private encrypted mesh network between machines. Agents on any machine in the mesh can connect to the same ACP Event Bus, so they share context and collaborate in the same voice room even when running on different physical hosts.
This is just networking. No special protocols, no new dependencies. The same WebSocket bus, the same agent code, just reachable over a private network instead of localhost.
graph TB
subgraph WG[WireGuard Mesh Network]
direction TB
subgraph A[Machine A: VPS]
Bus[ACP Event Bus]
LK[LiveKit Server]
FE[Agora Frontend]
L1[Agent 1]
L2[Agent 2]
end
subgraph B[Machine B: Home PC]
A3[Agent 3]
A4[Agent 4]
end
subgraph C[Machine C: Another Server]
A5[Agent 5]
end
end
L1 --> Bus
L2 --> Bus
A3 -->|WebSocket over WireGuard| Bus
A4 -->|WebSocket over WireGuard| Bus
A5 -->|WebSocket over WireGuard| Bus
A <-->|WireGuard encrypted| B
A <-->|WireGuard encrypted| C
B <-->|WireGuard encrypted| C
style WG fill:#0d1117,stroke:#e94560,color:#c9d1d9
style A fill:#161b22,stroke:#0f3460,color:#c9d1d9
style B fill:#161b22,stroke:#533483,color:#c9d1d9
style C fill:#161b22,stroke:#533483,color:#c9d1d9
style Bus fill:#e94560,stroke:#fff,color:#fff
What this enables:
- Distribute agents across locations while keeping them connected to the same bus
- Run agents at home, at work, and on cloud servers, all in the same voice room
- Scale by adding machines to the WireGuard mesh instead of putting everything on one host
- All traffic between machines is encrypted, no public internet exposure
- Already tested and proven between a VPS and a home PC
Full implementation guide: docs/wireguard-mesh.md
- Docker (for agent containers)
- Python 3.10 or newer
- Node.js 18 or newer
- A LiveKit server (or use the included docker-compose)
- At least one agent gateway with an OpenAI-compatible HTTP endpoint
Step 1: Clone the repository
git clone https://github.com/0xyg3n/Agora.git
cd AgoraStep 2: Configure your environment
cp .env.example .envEdit .env with your settings:
# LiveKit server
LIVEKIT_URL=ws://127.0.0.1:7880
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
# Agent 1
AGENT_MYAGENT1_URL=http://127.0.0.1:3133
EDGE_TTS_VOICE_MYAGENT1=en-US-AriaNeural
AGENT_MYAGENT1_GREETING=Hello, I am here!
AGENT_MYAGENT1_DELAY=0.5
# Agent 2
AGENT_MYAGENT2_URL=http://127.0.0.1:8642
EDGE_TTS_VOICE_MYAGENT2=en-US-GuyNeural
AGENT_MYAGENT2_GREETING=Hey there.
AGENT_MYAGENT2_DELAY=3.0
# ACP Event Bus
ACP_BUS_HOST=0.0.0.0
ACP_BUS_PORT=9090
ACP_STREAMING_AGENTS=myagent1,myagent2Step 3: Start LiveKit
cd server
docker compose up -dStep 4: Install agent dependencies
cd agent
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtStep 5: Start the ACP Event Bus
python acp_bus.py &Step 6: Start your agents
AGENT_NAME=MyAgent1 ACP_ENABLED=true python agent.py dev &
AGENT_NAME=MyAgent2 ACP_ENABLED=true python agent.py dev &Or use the all-in-one script:
./scripts/start-multi-agents.shStep 7: Start the frontend
cd frontend
npm install
npm run build
npx tsx server.tsStep 8: Open your browser
http://127.0.0.1:3210
For remote access via SSH tunnel:
ssh -L 3210:127.0.0.1:3210 -L 7880:127.0.0.1:7880 yourserver| Variable | Default | Description |
|---|---|---|
LIVEKIT_URL |
ws://localhost:7880 |
LiveKit WebSocket URL |
LIVEKIT_API_KEY |
LiveKit API key | |
LIVEKIT_API_SECRET |
LiveKit API secret | |
AGENT_NAME |
Agent name (set per process) | |
ACP_ENABLED |
true |
Enable the ACP bridge |
ACP_STREAMING_AGENTS |
Comma-separated agents with SSE streaming | |
ACP_BUS_HOST |
0.0.0.0 |
Event Bus bind address |
ACP_BUS_PORT |
9090 |
Event Bus port |
ACP_BUS_SECRET |
(empty) | Bus authentication secret (recommended for production) |
EDGE_TTS_VOICE_<NAME> |
per agent | edge-tts voice for a specific agent |
WHISPER_MODEL |
small |
faster-whisper model size |
LLM_BACKEND |
anthropic |
LLM backend: anthropic, openai, or ollama |
ANTHROPIC_OAUTH_TOKEN |
OAuth token for vision (sk-ant-oat...) |
|
ANTHROPIC_VISION_MODEL |
claude-sonnet-4-6 |
Model for vision requests |
agora/
├── agent/
│ ├── agent.py # Main voice agent
│ ├── acp_bridge.py # HTTP streaming bridge to gateways
│ ├── acp_bus.py # ACP Event Bus server
│ ├── acp_bus_client.py # Bus client library
│ ├── acp_protocol.py # Message types
│ ├── agent_registry.py # Agent configuration registry
│ ├── openclaw_api_shim.py # OpenClaw HTTP and SSE shim
│ ├── edge_tts_plugin.py # Text-to-speech plugin
│ ├── whisper_stt_plugin.py # Speech-to-text plugin
│ ├── vision.py # Vision and camera module
│ ├── runtime_utils.py # Helper utilities
│ └── tests/ # Test suite (92 tests)
├── frontend/
│ ├── server.ts # Token server and operations API
│ └── src/ # React user interface
├── scripts/
│ └── start-multi-agents.sh # All-in-one startup script
├── server/
│ ├── docker-compose.yml # LiveKit server
│ └── livekit.yaml # LiveKit configuration
├── docs/
│ └── wireguard-mesh.md # Multi-machine scaling guide
└── README.md
- Authentication on agent shims and optional bus authentication secret
- Input sanitization for room names, session IDs, and participant names against injection attacks
- Request limits of 1 MB on all API shim endpoints
- Error scrubbing so internal errors and stack traces are never exposed to clients
- Session isolation with random suffixes to prevent hijacking
- TTS sanitization to strip code blocks, URLs, and terminal output before voice synthesis
25 security findings were identified and resolved during development.
cd agent
source .venv/bin/activate
python -m pytest tests/ -v83 passed
Coverage includes the ACP protocol, event bus, bus client, agent registry, TTS sanitizer, ACP bridge, sentence splitting, and runtime utilities.
Built by 0xyg3n