A flexible voice AI agent system that seamlessly integrates PipeCat's real-time voice capabilities with LangChain's powerful AI orchestration and LangGraph's multi-agent routing.
This system provides a bridge between PipeCat's voice infrastructure and multiple specialized LangGraph agents through an intelligent supervisor routing system.
graph TD
%% Top Layer - Voice and Routing
User[π€ User] <-->|Voice| PipeCat[π PipeCat Voice Interface]
PipeCat <-->|Text Query/Response| Supervisor[π§ Supervisor Agent]
%% Bottom Layer - Multiple LangGraphs
Supervisor -->|Routes to| MedicalGraph[π₯ Medical LangGraph]
Supervisor -->|Routes to| LegalGraph[βοΈ Legal LangGraph]
Supervisor -->|Routes to| MoreGraphs[β Add More LangGraphs...]
%% Each Graph can have multiple nodes
subgraph MedicalGraph[π₯ Medical LangGraph]
MedNodes[Multiple Nodes: Diagnosis β Treatment β Prescription]
end
subgraph LegalGraph[βοΈ Legal LangGraph]
LegalNodes[Multiple Nodes: Research β Analysis β Advice]
end
%% Styling
classDef topLayer fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
classDef graphLayer fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
classDef nodeLayer fill:#f1f8e9
class User,PipeCAT,Supervisor topLayer
class MedicalGraph,LegalGraph,MoreGraphs graphLayer
class MedNodes,LegalNodes nodeLayer
- Voice Interface: Real-time speech-to-text and text-to-speech using PipeCat
- Supervisor Agent: Intelligent routing system that directs queries to appropriate specialized agents
- Specialized Graphs: Domain-specific LangGraph agents (e.g., medical assistant)
- WebRTC Transport: Low-latency voice communication over web protocols
- Real-time voice conversation with AI agents
- Intelligent query routing based on content analysis
- Specialized domain expertise (medical advice, general assistance)
- Interruptible conversations with natural flow
- WebRTC-based low-latency communication
- Extensible architecture for adding new specialized agents
- Python 3.11 or higher
- uv package manager
- OpenAI API key
- Deepgram API key (for speech-to-text)
- Cartesia API key (for text-to-speech)
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"git clone <your-repo-url>
cd pipecat-langchain-voice-ai# Install all dependencies
uv sync
# Or install in development mode
uv sync --devCreate a .env file in the project root:
OPENAI_API_KEY=your_openai_api_key_here
DEEPGRAM_API_KEY=your_deepgram_api_key_here
CARTESIA_API_KEY=your_cartesia_api_key_here# Using uv
uv run python 07b-interruptible-langchain.py
# Or activate the virtual environment and run
uv shell
python 07b-interruptible-langchain.pyThe application will start a web server (default: http://localhost:7860) where you can interact with the voice AI agent through your browser.
python run.py [bot_file] [--host HOST] [--port PORT] [--verbose]bot_file: Path to the bot file (optional, auto-detected)--host: Host for HTTP server (default: localhost)--port: Port for HTTP server (default: 7860)--verbose: Enable verbose logging
- Create a new LangGraph in
graph.pyor a separate file - Update the supervisor routing logic in
supervisor.py - Add the new route condition in the
supervisor_nodefunction
Edit the system prompt in graph.py:
prompt = ChatPromptTemplate.from_messages([
("system", "Your custom medical assistant prompt here"),
("human", "{input}")
])Modify voice settings in 07b-interruptible-langchain.py:
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="your_preferred_voice_id", # Change voice
)βββ supervisor.py # Main supervisor routing logic
βββ graph.py # Specialized LangGraph definitions
βββ 07b-interruptible-langchain.py # Main voice AI application
βββ run.py # Application runner and server setup
βββ pyproject.toml # Project dependencies and metadata
βββ .env # Environment variables (create this)
βββ README.md # This file
- Voice Input: User speaks into the web interface
- Speech-to-Text: Deepgram converts speech to text
- Supervisor Routing: The supervisor agent analyzes the query and routes it to the appropriate specialized agent
- Processing: The selected LangGraph processes the query using OpenAI's GPT model
- Response Generation: The specialized agent generates a contextual response
- Text-to-Speech: Cartesia converts the response to natural speech
- Voice Output: The response is played back to the user
# Run with verbose logging for debugging
uv run python 07b-interruptible-langchain.py --verbose
# Test the web interface
# Navigate to http://localhost:7860 in your browser- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.
- PipeCat AI - Real-time voice AI infrastructure
- LangChain - AI application framework
- LangGraph - Multi-agent orchestration
For questions or issues:
- Check existing Issues
- Create a new issue with detailed information
- Include logs and error messages when reporting bugs
Built with β€οΈ using PipeCat, LangChain, and LangGraph