This project integrates voice interaction components to create a comprehensive local voice AI solution, providing a complete voice conversation experience. The system uses a microservices architecture that combines speech recognition, natural language processing, and voice synthesis technologies to create an intelligent voice assistant that runs in a local environment.
This project integrates multiple voice interaction components, including:
- Speech Recognition Service - Uses OpenAI Whisper model for speech-to-text conversion
- Intelligent Conversation Engine - Provides intelligent responses based on Ollama models
- Web Frontend Interface - Next.js application for browser voice interaction
- Containerized Deployment - Supports Docker and Kubernetes deployment
User Voice Interaction Flow
┌─────────────────────────────────────────────────────────────────┐
│ │
▼ ①Record Voice │
┌─────────┐ ┌──────────────┐ │
│ User │────────►│ Web Frontend │ │
│(Browser)│ │ (Next.js) │ │
└─────────┘ └──────┬───────┘ │
▲ │ │
│ │②Upload Audio File │
│⑤Play Response ▼ │
│ ┌──────────────┐ ③Speech-to-Text ┌─────────────────┐ │
│ │ API Route │──────────────►│ Whisper Service │ │
│ │ (/api/*) │◄──────────────│ (Flask API) │ │
│ └──────┬───────┘ ④Return Text └─────────────────┘ │
│ │ │
│ │⑤Send Text │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Ollama Service │ │
│ │ (LLaMA3) │ │
│ └─────────────────┘ │
│ │ │
└─────────────────────┘⑥Return AI Response │
│
Kubernetes Cluster Environment
┌─────────────────────────────────────────────────────────────────┘
│
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ Pod │ │ Pod │ │ Pod │
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │
│ │ │Next.js │ │ │ │ Whisper │ │ │ │ Ollama │ │
│ │ │ App │ │ │ │ Service │ │ │ │ Service │ │
│ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │
│ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
│ └──────────────────┼──────────────────┘
│ │
│ ┌─────────────────────────┼─────────────────────────┐
│ │ Service Network │
│ │ (Load Balancing & Service Discovery) │
│ └─────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────
- 🎤 Browser Voice Recording - Supports real-time voice input and recording
- 🔊 High-Precision Speech Recognition - Uses Whisper model for speech-to-text conversion
- 🤖 Intelligent Conversation Generation - Natural language processing based on oLLaMA models
- 💬 Real-time Conversation Interface - Smooth user interaction experience
- 🐳 Containerized Deployment - Supports Docker and Kubernetes environments
- 🔧 Scalable Architecture - Microservices design for easy maintenance and expansion
- Next.js - React framework
- TypeScript - Type safety
- MediaRecorder API - Browser audio recording
- Flask - Whisper speech recognition service
- Ollama - LLaMA3 conversation model service
- Python - Backend logic processing
- Docker - Containerization technology
- Kubernetes - Container orchestration
- Google Kubernetes Engine (GKE) - Cloud deployment
Deploy the complete system on a Kubernetes cluster:
1. export $(grep -v '^#' ./.env | xargs)
2. ./k8s_ops_script/gcp_deploy_init.sh
3. ./k8s_ops_script/gcp_deploy_build.sh
4. ./k8s_ops_script/gcp_deploy_deploy.sh- Start Whisper Service
cd whisper-service
pip install -r requirements.txt
python app.py- Start Ollama Service
# Install and start Ollama
ollama serve
ollama pull llama3- Start Frontend Application
cd app
npm install
npm run devlocal-voice-ai/
├── app/ # Next.js frontend application
├── whisper-service/ # Whisper speech recognition service
├── k8s/ # Kubernetes deployment configuration
├── k8s_ops_script/ # Operations scripts
└── README.md # Project documentation
WHISPER_MODEL: Whisper model version (tiny/base/small/medium/large)OLLAMA_HOST: Ollama service addressPORT: Service port number
- Deployments: Application service deployment
- Services: Service exposure and load balancing
- ConfigMaps: Configuration management
- PersistentVolumes: Data persistence
- Fully Local - No dependence on external APIs, ensuring data privacy
- Highly Integrated - Seamless integration of voice interaction components
- Containerized Deployment - Easy to deploy and scale
- Microservices Architecture - Modular design for easy maintenance
- Multi-Environment Support - Supports both local development and production deployment
This project is licensed under the MIT License.