LiveKit Agents plugin for Deepslate's realtime voice AI API.
deepslate-livekit provides a RealtimeModel implementation for the LiveKit Agents framework, enabling seamless integration with Deepslate's unified voice AI infrastructure.
pip install git+https://github.com/rooms-solutions/deepslate-livekit.gitYou'll need a Deepslate account with access to the realtime API. Set the following environment variables:
DEEPSLATE_VENDOR_ID=your_vendor_id
DEEPSLATE_ORGANIZATION_ID=your_organization_id
DEEPSLATE_API_KEY=your_api_keyFor server-side text-to-speech with ElevenLabs:
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_voice_id
ELEVENLABS_MODEL_ID=eleven_turbo_v2 # optionalNote: You can alternatively use your own TTS through LiveKit's standard TTS integration. However, context truncation (interruption handling) will not work without server-side TTS.
from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, room_io
import deepslate.livekit.realtime
from deepslate.livekit.realtime import ElevenLabsTtsConfig
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(instructions="You are a helpful voice AI assistant.")
server = AgentServer()
@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
session = AgentSession(
llm=deepslate.livekit.realtime.RealtimeModel(
tts_config=ElevenLabsTtsConfig.from_env()
),
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_options=room_io.RoomOptions(),
)
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
agents.cli.run_app(server)| Parameter | Type | Default | Description |
|---|---|---|---|
vendor_id |
str |
env: DEEPSLATE_VENDOR_ID |
Deepslate vendor ID |
organization_id |
str |
env: DEEPSLATE_ORGANIZATION_ID |
Deepslate organization ID |
api_key |
str |
env: DEEPSLATE_API_KEY |
Deepslate API key |
base_url |
str |
https://app.deepslate.eu |
Base URL for Deepslate API |
system_prompt |
str |
"You are a helpful assistant." |
System prompt for the model |
generate_reply_timeout |
float |
30.0 |
Timeout in seconds for generate_reply (0 = no timeout) |
tts_config |
ElevenLabsTtsConfig |
None |
TTS configuration (enables audio output) |
| Parameter | Type | Default | Description |
|---|---|---|---|
vad_confidence_threshold |
float |
0.5 |
Minimum confidence to consider audio as speech (0.0-1.0) |
vad_min_volume |
float |
0.01 |
Minimum volume threshold (0.0-1.0) |
vad_start_duration_ms |
int |
200 |
Duration of speech to detect start (ms) |
vad_stop_duration_ms |
int |
500 |
Duration of silence to detect end (ms) |
vad_backbuffer_duration_ms |
int |
1000 |
Audio buffer before speech detection (ms) |
| Parameter | Type | Description |
|---|---|---|
api_key |
str |
ElevenLabs API key (env: ELEVENLABS_API_KEY) |
voice_id |
str |
Voice ID (env: ELEVENLABS_VOICE_ID) |
model_id |
str | None |
Model ID, e.g., eleven_turbo_v2 (env: ELEVENLABS_MODEL_ID) |
Use ElevenLabsTtsConfig.from_env() to create a config from environment variables.
- Realtime Voice AI Streaming - Low-latency bidirectional audio streaming
- Server-side VAD - Voice activity detection handled server-side
- Function Tools - Define and use function tools with
@function_tool()decorator - ElevenLabs TTS Integration - Server-side text-to-speech with context truncation support
- Automatic Interruption Handling - When using server-side TTS, interrupted responses are automatically truncated
Contributions are welcome! Please feel free to submit issues and pull requests.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.