PipeCat-LangChain Voice AI Agent

A flexible voice AI agent system that seamlessly integrates PipeCat's real-time voice capabilities with LangChain's powerful AI orchestration and LangGraph's multi-agent routing.

🏗️ Architecture

This system provides a bridge between PipeCat's voice infrastructure and multiple specialized LangGraph agents through an intelligent supervisor routing system.

graph TD
    %% Top Layer - Voice and Routing
    User[👤 User] <-->|Voice| PipeCat[📞 PipeCat Voice Interface]
    PipeCat <-->|Text Query/Response| Supervisor[🧠 Supervisor Agent]
    
    %% Bottom Layer - Multiple LangGraphs
    Supervisor -->|Routes to| MedicalGraph[🏥 Medical LangGraph]
    Supervisor -->|Routes to| LegalGraph[⚖️ Legal LangGraph]
    Supervisor -->|Routes to| MoreGraphs[➕ Add More LangGraphs...]
    
    %% Each Graph can have multiple nodes
    subgraph MedicalGraph[🏥 Medical LangGraph]
        MedNodes[Multiple Nodes: Diagnosis → Treatment → Prescription]
    end
    
    subgraph LegalGraph[⚖️ Legal LangGraph]
        LegalNodes[Multiple Nodes: Research → Analysis → Advice]
    end
    
    %% Styling
    classDef topLayer fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
    classDef graphLayer fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
    classDef nodeLayer fill:#f1f8e9
    
    class User,PipeCAT,Supervisor topLayer
    class MedicalGraph,LegalGraph,MoreGraphs graphLayer
    class MedNodes,LegalNodes nodeLayer

Key Components

Voice Interface: Real-time speech-to-text and text-to-speech using PipeCat
Supervisor Agent: Intelligent routing system that directs queries to appropriate specialized agents
Specialized Graphs: Domain-specific LangGraph agents (e.g., medical assistant)
WebRTC Transport: Low-latency voice communication over web protocols

🚀 Features

Real-time voice conversation with AI agents
Intelligent query routing based on content analysis
Specialized domain expertise (medical advice, general assistance)
Interruptible conversations with natural flow
WebRTC-based low-latency communication
Extensible architecture for adding new specialized agents

📋 Prerequisites

Python 3.11 or higher
uv package manager
OpenAI API key
Deepgram API key (for speech-to-text)
Cartesia API key (for text-to-speech)

🛠️ Installation

1. Install uv (if not already installed)

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Clone the repository

git clone <your-repo-url>
cd pipecat-langchain-voice-ai

3. Install dependencies with uv

# Install all dependencies
uv sync

# Or install in development mode
uv sync --dev

4. Set up environment variables

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here
DEEPGRAM_API_KEY=your_deepgram_api_key_here
CARTESIA_API_KEY=your_cartesia_api_key_here

🎯 Usage

Running the Voice AI Agent

# Using uv
uv run python 07b-interruptible-langchain.py

# Or activate the virtual environment and run
uv shell
python 07b-interruptible-langchain.py

The application will start a web server (default: http://localhost:7860) where you can interact with the voice AI agent through your browser.

Command Line Options

python run.py [bot_file] [--host HOST] [--port PORT] [--verbose]

bot_file: Path to the bot file (optional, auto-detected)
--host: Host for HTTP server (default: localhost)
--port: Port for HTTP server (default: 7860)
--verbose: Enable verbose logging

🔧 Configuration

Adding New Specialized Agents

Create a new LangGraph in graph.py or a separate file
Update the supervisor routing logic in supervisor.py
Add the new route condition in the supervisor_node function

Customizing the Medical Agent

Edit the system prompt in graph.py:

prompt = ChatPromptTemplate.from_messages([
    ("system", "Your custom medical assistant prompt here"),
    ("human", "{input}")
])

Voice Configuration

Modify voice settings in 07b-interruptible-langchain.py:

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    voice_id="your_preferred_voice_id",  # Change voice
)

📁 Project Structure

├── supervisor.py              # Main supervisor routing logic
├── graph.py                   # Specialized LangGraph definitions
├── 07b-interruptible-langchain.py  # Main voice AI application
├── run.py                     # Application runner and server setup
├── pyproject.toml            # Project dependencies and metadata
├── .env                      # Environment variables (create this)
└── README.md                 # This file

🔄 How It Works

Voice Input: User speaks into the web interface
Speech-to-Text: Deepgram converts speech to text
Supervisor Routing: The supervisor agent analyzes the query and routes it to the appropriate specialized agent
Processing: The selected LangGraph processes the query using OpenAI's GPT model
Response Generation: The specialized agent generates a contextual response
Text-to-Speech: Cartesia converts the response to natural speech
Voice Output: The response is played back to the user

🧪 Testing

# Run with verbose logging for debugging
uv run python 07b-interruptible-langchain.py --verbose

# Test the web interface
# Navigate to http://localhost:7860 in your browser

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests if applicable
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.

🙏 Acknowledgments

PipeCat AI - Real-time voice AI infrastructure
LangChain - AI application framework
LangGraph - Multi-agent orchestration

📞 Support

For questions or issues:

Check existing Issues
Create a new issue with detailed information
Include logs and error messages when reporting bugs

Built with ❤️ using PipeCat, LangChain, and LangGraph

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PipeCat-LangChain Voice AI Agent

🏗️ Architecture

Key Components

🚀 Features

📋 Prerequisites

🛠️ Installation

1. Install uv (if not already installed)

2. Clone the repository

3. Install dependencies with uv

4. Set up environment variables

🎯 Usage

Running the Voice AI Agent

Command Line Options

🔧 Configuration

Adding New Specialized Agents

Customizing the Medical Agent

Voice Configuration

📁 Project Structure

🔄 How It Works

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env		.env
.gitignore		.gitignore
.python-version		.python-version
07b-interruptible-langchain.py		07b-interruptible-langchain.py
LICENSE		LICENSE
README.md		README.md
flow_diagram.png		flow_diagram.png
graph.py		graph.py
main.py		main.py
pyproject.toml		pyproject.toml
run.py		run.py
supervisor.py		supervisor.py

Folders and files

Latest commit

History

Repository files navigation

PipeCat-LangChain Voice AI Agent

🏗️ Architecture

Key Components

🚀 Features

📋 Prerequisites

🛠️ Installation

1. Install uv (if not already installed)

2. Clone the repository

3. Install dependencies with uv

4. Set up environment variables

🎯 Usage

Running the Voice AI Agent

Command Line Options

🔧 Configuration

Adding New Specialized Agents

Customizing the Medical Agent

Voice Configuration

📁 Project Structure

🔄 How It Works

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages