Skip to content

0xyg3n/Agora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agora

Real-time voice rooms where humans and AI agents collaborate across platforms

Open-source, self-hosted, model-agnostic. Zero voice API costs.

MIT License Tests Zero Voice API Costs Python 3.10+ TypeScript LiveKit

Quick StartArchitectureSupported PlatformsScreenshotsMulti-Machine Guide


Agora is an open platform for real-time voice collaboration between humans and AI agents. Agents join voice rooms as full participants. They hear you, speak back, and work together. The ACP Event Bus gives every agent shared awareness across all connected platforms, from voice rooms to Telegram to Discord.

Why Agora

Zero voice API costsSilero VAD, faster-whisper, and edge-tts all run locally. No cloud speech APIs, no per-minute billing. Free forever.

Any agent platformHermes Agent, OpenClaw, LangChain, Ollama, vLLM, or any OpenAI-compatible HTTP endpoint. Plug in and go.

Cross-session awareness — The ACP Event Bus connects voice rooms, Telegram, and Discord into one shared context. An agent on Telegram knows what happened in the voice room.

Self-hosted and private — Your voice data never leaves your infrastructure. Run everything on your own hardware.

Streaming responses — Agents speak the first sentence while still generating the rest. No awkward silence.

Multi-machine ready — Distribute agents across machines with WireGuard. Same protocol, zero changes.

Powered By

LiveKit Anthropic OpenAI Silero VAD faster-whisper edge-tts React


Table of Contents


Screenshots

Pre-join In-call
Pre-join In-call
In-call Variations Controls
Stage v3 Controls

What Is Agora

Agora brings humans and AI agents into the same voice room. Everyone hears each other, speaks naturally, and collaborates in real time.

  • Live voice rooms where AI agents are first-class participants, not bots watching from the side
  • Works with any LLM backend including Hermes Agent, OpenClaw, LangChain, Ollama, vLLM, or any OpenAI-compatible endpoint
  • Fully local voice processing using Silero VAD, faster-whisper, and edge-tts with zero cloud dependencies
  • Cross-session awareness through the ACP Event Bus, connecting voice rooms, Telegram, and Discord into one shared context
  • Progressive TTS where the agent speaks the first sentence while still generating the rest, eliminating dead air

Model agnostic by design. Agora works with any agent that exposes an OpenAI-compatible HTTP API. Tested with Anthropic Claude Opus 4.6 and OpenAI GPT-5.4 for production-grade performance.


Architecture

System Overview

graph TB
    subgraph Browser
        UI[Browser UI<br/>Pre-join / Voice Room / Chat]
    end

    subgraph LiveKit[LiveKit Media Server]
        LK[Humans + Agents<br/>in the same room]
    end

    subgraph Agents
        direction LR
        Laira[Agent: Laira<br/>Hermes]
        Loki[Agent: Loki<br/>OpenClaw]
    end

    subgraph Voice[Voice Pipeline]
        VAD[Silero VAD]
        STT[faster-whisper STT]
        TTS[edge-tts Output]
    end

    subgraph Bus[ACP Event Bus]
        PubSub[WebSocket Pub/Sub<br/>Cross-session context<br/>100-event ring buffer]
    end

    subgraph Gateways[Agent Gateways]
        Hermes[Hermes Gateway<br/>HTTP + SSE streaming]
        OpenClaw[OpenClaw Gateway<br/>API Shim + SSE]
    end

    subgraph Platforms[Connected Platforms]
        TG[Telegram]
        DC[Discord]
    end

    UI -->|WebRTC| LK
    LK --> Laira
    LK --> Loki
    Laira --> VAD --> STT
    STT -->|ACP Bridge| Hermes
    Hermes -->|SSE stream| TTS
    TTS -->|audio| LK
    Loki -->|ACP Bridge| OpenClaw
    Laira -->|publish events| PubSub
    Loki -->|publish events| PubSub
    PubSub <-->|acp_bus_query| TG
    PubSub <-->|acp_bus_query| DC

    style Bus fill:#1a1a2e,stroke:#e94560,color:#fff
    style Agents fill:#16213e,stroke:#0f3460,color:#fff
    style Gateways fill:#0f3460,stroke:#533483,color:#fff
    style Platforms fill:#533483,stroke:#e94560,color:#fff
Loading

Voice Pipeline

graph LR
    Mic[Human Mic] -->|audio| VAD[Silero VAD<br/>Voice Activity Detection]
    VAD -->|speech segments| STT[faster-whisper<br/>Speech-to-Text]
    STT -->|transcript| Agent[Agent Process]
    Agent -->|ACP Bridge<br/>HTTP streaming| Gateway[LLM Gateway<br/>Hermes / OpenClaw]
    Gateway -->|SSE chunks<br/>sentence by sentence| Split[Sentence Splitter]
    Split -->|progressive| TTS[edge-tts<br/>Text-to-Speech]
    TTS -->|audio frames| Room[LiveKit Room<br/>Speakers]

    style Agent fill:#e94560,stroke:#1a1a2e,color:#fff
    style Gateway fill:#0f3460,stroke:#533483,color:#fff
    style TTS fill:#16213e,stroke:#e94560,color:#fff
Loading

Every component in this pipeline is free and runs locally. No API keys, no per-minute billing, no cloud dependencies.

Cross-Session Awareness

When someone speaks in a voice room, the event is published to the ACP Event Bus. Any agent on any platform can then query the bus to learn what happened.

sequenceDiagram
    participant VR as Agora Voice Room
    participant Bus as ACP Event Bus
    participant TG as Telegram Session

    VR->>Bus: publish event (user spoke in room)
    Bus-->>Bus: Store in ring buffer

    Note over TG: Later, on Telegram...
    TG->>TG: User asks about voice room activity
    TG->>Bus: acp_bus_query(topic="room:agora-comms")
    Bus-->>TG: Returns recent room events
    TG->>TG: Agent responds with voice room context
Loading

ACP Event Bus

The bus is a lightweight WebSocket pub/sub broker that serves as the shared context layer across all sessions.

graph TB
    subgraph Bus[ACP Event Bus<br/>ws://0.0.0.0:9090]
        R1[room:agora-comms]
        R2[agent:laira]
        R3[agent:loki]
    end

    A1[Agora Agent: Laira] -->|publish + subscribe| R1
    A2[Agora Agent: Loki] -->|publish + subscribe| R1
    T1[Telegram: Laira] -->|acp_bus_query| R1
    T2[Telegram: Loki] -->|acp_bus_query| R1
    D1[Discord: Laira] -->|acp_bus_query| R1

    style Bus fill:#1a1a2e,stroke:#e94560,color:#fff
    style A1 fill:#0f3460,stroke:#e94560,color:#fff
    style A2 fill:#0f3460,stroke:#e94560,color:#fff
    style T1 fill:#533483,stroke:#e94560,color:#fff
    style T2 fill:#533483,stroke:#e94560,color:#fff
    style D1 fill:#533483,stroke:#e94560,color:#fff
Loading

Event format:

{
  "type": "voice_input",
  "speaker": "User",
  "agent": "laira",
  "content": "Hey everyone, can you hear me?",
  "ts": 1712345678.123
}

Key properties:

  • In-memory ring buffer of 100 events per topic, no disk, no database
  • Topics follow the pattern room:<name> or agent:<name>
  • Agents query on demand via the native acp_bus_query tool
  • Sub-millisecond publish latency within the same host

Supported Agent Platforms

Agora does not care where your agent runs or what powers it. If it exposes an HTTP endpoint, it works. Self-hosted, cloud, bare metal, Docker, or Kubernetes.

Hermes Agent (native support)

Open-source agent framework by Nous Research. Fully self-hostable.

  • Direct HTTP streaming via the Hermes API server
  • SSE streaming for progressive TTS so the agent speaks while still generating
  • Native acp_bus_query tool registered in the Hermes tool system
  • Agora registered as a first-class platform in the gateway
  • Session persistence, persistent memory, skills, and full tool access
  • Source: github.com/NousResearch/hermes-agent

OpenClaw (supported via API shim)

Open-source autonomous agent framework with WebSocket gateway, browser automation, and multi-channel delivery.

  • OpenAI-compatible HTTP wrapper deployed inside the container
  • SSE streaming with response split into sentences and streamed as chunks
  • Cross-session bus query via workspace skill
  • Session persistence via session ID routing

Any OpenAI-Compatible Agent

Any agent that exposes /v1/chat/completions works out of the box. This includes LangChain servers, LlamaIndex agents, FastAPI wrappers, vLLM endpoints, Ollama, and any other OpenAI-compatible API.

# agent/agent_registry.py
AgentConfig(
    name="Nova",
    container="my-nova-container",
    acp_url="http://127.0.0.1:8080",
    voice="en-US-JennyNeural",
    streaming=True,
    greeting="Hi, Nova here!",
    delay=2.0,
)

Start with: AGENT_NAME=Nova ACP_ENABLED=true python agent.py dev


Vision and Camera

Agents can see what you see. Enable your webcam or share your screen, and ask naturally:

  • "Can you see me?"
  • "What do you see?"
  • "Look at my screen"
  • "How many fingers am I holding up?"
  • "What am I wearing?"

How it works: On each vision request, the agent captures a single frame from your camera or screen share via LiveKit, encodes it as a 1024x1024 JPEG, and sends it directly to the Anthropic Claude Vision API. The response is spoken back naturally via TTS. Vision bypasses the agent gateway entirely for minimal latency.

Both agents can use vision. The camera is tried first; if unavailable, the agent falls back to screen share.

User enables camera/screenshare
   --> LiveKit video track
       --> Agent detects vision intent ("can you see me?")
           --> Captures single JPEG frame (1024x1024)
               --> Claude Vision API (direct, bypasses gateway)
                   --> Agent speaks description via TTS

Configuration:

Variable Default Description
ANTHROPIC_OAUTH_TOKEN OAuth token for vision API (sk-ant-oat... from Claude subscription)
ANTHROPIC_VISION_MODEL claude-sonnet-4-6 Model for vision requests (claude-opus-4-6 for best quality)

Vision is not continuous video. Each request captures one frame, describes it, and returns to normal voice mode.


Deployment Models

Model Description Example
Single Machine Everything on one host VPS with Docker containers, simplest setup
Self-Hosted + VPS Agents on your PC, bus on VPS Run agents at home, host the room remotely
Multi-VPS Distributed across cloud instances Scale agents across regions
Hybrid Mix of local and cloud machines Agents on different machines, all on the same bus

The ACP Event Bus is the glue. An agent running on your home machine connects to the same bus as an agent on a cloud VPS. They share context, see the same events, and collaborate in the same voice room regardless of where they physically run.

graph LR
    subgraph Local[Your Machine]
        A1[Agent Container]
    end
    subgraph VPS[Cloud VPS]
        Bus((ACP Bus))
        LK[LiveKit + Agora]
        A2[Agent Container]
    end
    subgraph Remote[Another Machine]
        A3[Agent Container]
    end

    A1 -->|WebSocket over WireGuard| Bus
    A2 --> Bus
    A3 -->|WebSocket over WireGuard| Bus
    Bus --> LK

    style Bus fill:#e94560,stroke:#fff,color:#fff
    style Local fill:#161b22,stroke:#533483,color:#c9d1d9
    style VPS fill:#161b22,stroke:#0f3460,color:#c9d1d9
    style Remote fill:#161b22,stroke:#533483,color:#c9d1d9
Loading

Scaling with WireGuard

WireGuard creates a private encrypted mesh network between machines. Agents on any machine in the mesh can connect to the same ACP Event Bus, so they share context and collaborate in the same voice room even when running on different physical hosts.

This is just networking. No special protocols, no new dependencies. The same WebSocket bus, the same agent code, just reachable over a private network instead of localhost.

graph TB
    subgraph WG[WireGuard Mesh Network]
        direction TB

        subgraph A[Machine A: VPS]
            Bus[ACP Event Bus]
            LK[LiveKit Server]
            FE[Agora Frontend]
            L1[Agent 1]
            L2[Agent 2]
        end

        subgraph B[Machine B: Home PC]
            A3[Agent 3]
            A4[Agent 4]
        end

        subgraph C[Machine C: Another Server]
            A5[Agent 5]
        end
    end

    L1 --> Bus
    L2 --> Bus
    A3 -->|WebSocket over WireGuard| Bus
    A4 -->|WebSocket over WireGuard| Bus
    A5 -->|WebSocket over WireGuard| Bus

    A <-->|WireGuard encrypted| B
    A <-->|WireGuard encrypted| C
    B <-->|WireGuard encrypted| C

    style WG fill:#0d1117,stroke:#e94560,color:#c9d1d9
    style A fill:#161b22,stroke:#0f3460,color:#c9d1d9
    style B fill:#161b22,stroke:#533483,color:#c9d1d9
    style C fill:#161b22,stroke:#533483,color:#c9d1d9
    style Bus fill:#e94560,stroke:#fff,color:#fff
Loading

What this enables:

  • Distribute agents across locations while keeping them connected to the same bus
  • Run agents at home, at work, and on cloud servers, all in the same voice room
  • Scale by adding machines to the WireGuard mesh instead of putting everything on one host
  • All traffic between machines is encrypted, no public internet exposure
  • Already tested and proven between a VPS and a home PC

Full implementation guide: docs/wireguard-mesh.md


Getting Started

Prerequisites

  • Docker (for agent containers)
  • Python 3.10 or newer
  • Node.js 18 or newer
  • A LiveKit server (or use the included docker-compose)
  • At least one agent gateway with an OpenAI-compatible HTTP endpoint

Installation

Step 1: Clone the repository

git clone https://github.com/0xyg3n/Agora.git
cd Agora

Step 2: Configure your environment

cp .env.example .env

Edit .env with your settings:

# LiveKit server
LIVEKIT_URL=ws://127.0.0.1:7880
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

# Agent 1
AGENT_MYAGENT1_URL=http://127.0.0.1:3133
EDGE_TTS_VOICE_MYAGENT1=en-US-AriaNeural
AGENT_MYAGENT1_GREETING=Hello, I am here!
AGENT_MYAGENT1_DELAY=0.5

# Agent 2
AGENT_MYAGENT2_URL=http://127.0.0.1:8642
EDGE_TTS_VOICE_MYAGENT2=en-US-GuyNeural
AGENT_MYAGENT2_GREETING=Hey there.
AGENT_MYAGENT2_DELAY=3.0

# ACP Event Bus
ACP_BUS_HOST=0.0.0.0
ACP_BUS_PORT=9090
ACP_STREAMING_AGENTS=myagent1,myagent2

Step 3: Start LiveKit

cd server
docker compose up -d

Step 4: Install agent dependencies

cd agent
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Step 5: Start the ACP Event Bus

python acp_bus.py &

Step 6: Start your agents

AGENT_NAME=MyAgent1 ACP_ENABLED=true python agent.py dev &
AGENT_NAME=MyAgent2 ACP_ENABLED=true python agent.py dev &

Or use the all-in-one script:

./scripts/start-multi-agents.sh

Step 7: Start the frontend

cd frontend
npm install
npm run build
npx tsx server.ts

Step 8: Open your browser

http://127.0.0.1:3210

For remote access via SSH tunnel:

ssh -L 3210:127.0.0.1:3210 -L 7880:127.0.0.1:7880 yourserver

Configuration Reference

Variable Default Description
LIVEKIT_URL ws://localhost:7880 LiveKit WebSocket URL
LIVEKIT_API_KEY LiveKit API key
LIVEKIT_API_SECRET LiveKit API secret
AGENT_NAME Agent name (set per process)
ACP_ENABLED true Enable the ACP bridge
ACP_STREAMING_AGENTS Comma-separated agents with SSE streaming
ACP_BUS_HOST 0.0.0.0 Event Bus bind address
ACP_BUS_PORT 9090 Event Bus port
ACP_BUS_SECRET (empty) Bus authentication secret (recommended for production)
EDGE_TTS_VOICE_<NAME> per agent edge-tts voice for a specific agent
WHISPER_MODEL small faster-whisper model size
LLM_BACKEND anthropic LLM backend: anthropic, openai, or ollama
ANTHROPIC_OAUTH_TOKEN OAuth token for vision (sk-ant-oat...)
ANTHROPIC_VISION_MODEL claude-sonnet-4-6 Model for vision requests

Project Structure

agora/
├── agent/
│   ├── agent.py              # Main voice agent
│   ├── acp_bridge.py         # HTTP streaming bridge to gateways
│   ├── acp_bus.py            # ACP Event Bus server
│   ├── acp_bus_client.py     # Bus client library
│   ├── acp_protocol.py       # Message types
│   ├── agent_registry.py     # Agent configuration registry
│   ├── openclaw_api_shim.py  # OpenClaw HTTP and SSE shim
│   ├── edge_tts_plugin.py    # Text-to-speech plugin
│   ├── whisper_stt_plugin.py # Speech-to-text plugin
│   ├── vision.py             # Vision and camera module
│   ├── runtime_utils.py      # Helper utilities
│   └── tests/                # Test suite (92 tests)
├── frontend/
│   ├── server.ts             # Token server and operations API
│   └── src/                  # React user interface
├── scripts/
│   └── start-multi-agents.sh # All-in-one startup script
├── server/
│   ├── docker-compose.yml    # LiveKit server
│   └── livekit.yaml          # LiveKit configuration
├── docs/
│   └── wireguard-mesh.md     # Multi-machine scaling guide
└── README.md

Security

  • Authentication on agent shims and optional bus authentication secret
  • Input sanitization for room names, session IDs, and participant names against injection attacks
  • Request limits of 1 MB on all API shim endpoints
  • Error scrubbing so internal errors and stack traces are never exposed to clients
  • Session isolation with random suffixes to prevent hijacking
  • TTS sanitization to strip code blocks, URLs, and terminal output before voice synthesis

25 security findings were identified and resolved during development.


Running Tests

cd agent
source .venv/bin/activate
python -m pytest tests/ -v
83 passed

Coverage includes the ACP protocol, event bus, bus client, agent registry, TTS sanitizer, ACP bridge, sentence splitting, and runtime utilities.


License

MIT License


Built by 0xyg3n

Releases

No releases published

Packages

 
 
 

Contributors