Local Voice AI Solution

This project integrates voice interaction components to create a comprehensive local voice AI solution, providing a complete voice conversation experience. The system uses a microservices architecture that combines speech recognition, natural language processing, and voice synthesis technologies to create an intelligent voice assistant that runs in a local environment.

📋 Project Overview

This project integrates multiple voice interaction components, including:

Speech Recognition Service - Uses OpenAI Whisper model for speech-to-text conversion
Intelligent Conversation Engine - Provides intelligent responses based on Ollama models
Web Frontend Interface - Next.js application for browser voice interaction
Containerized Deployment - Supports Docker and Kubernetes deployment

🏗️ System Architecture

                          User Voice Interaction Flow
    ┌─────────────────────────────────────────────────────────────────┐
    │                                                                 │
    ▼                  ①Record Voice                                   │
┌─────────┐         ┌──────────────┐                                   │
│  User   │────────►│ Web Frontend │                                   │
│(Browser)│         │  (Next.js)   │                                   │
└─────────┘         └──────┬───────┘                                   │
    ▲                      │                                           │
    │                      │②Upload Audio File                         │
    │⑤Play Response         ▼                                           │
    │              ┌──────────────┐   ③Speech-to-Text   ┌─────────────────┐ │
    │              │   API Route  │──────────────►│ Whisper Service │ │
    │              │  (/api/*)    │◄──────────────│   (Flask API)   │ │
    │              └──────┬───────┘   ④Return Text    └─────────────────┘ │
    │                     │                                           │
    │                     │⑤Send Text                                  │
    │                     ▼                                           │
    │              ┌─────────────────┐                                 │
    │              │ Ollama Service  │                                 │
    │              │   (LLaMA3)      │                                 │
    │              └─────────────────┘                                 │
    │                     │                                           │
    └─────────────────────┘⑥Return AI Response                        │
                                                                      │


                        Kubernetes Cluster Environment                           
    ┌─────────────────────────────────────────────────────────────────┘
    │
    │  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐
    │  │    Pod      │   │    Pod      │   │    Pod      │
    │  │ ┌─────────┐ │   │ ┌─────────┐ │   │ ┌─────────┐ │
    │  │ │Next.js  │ │   │ │ Whisper │ │   │ │ Ollama  │ │
    │  │ │   App   │ │   │ │ Service │ │   │ │ Service │ │
    │  │ └─────────┘ │   │ └─────────┘ │   │ └─────────┘ │
    │  └─────────────┘   └─────────────┘   └─────────────┘
    │        │                  │                  │
    │        └──────────────────┼──────────────────┘
    │                           │
    │  ┌─────────────────────────┼─────────────────────────┐
    │  │                 Service Network                  │
    │  │          (Load Balancing & Service Discovery)    │
    │  └─────────────────────────────────────────────────┘
    └─────────────────────────────────────────────────────

🚀 Key Features

🎤 Browser Voice Recording - Supports real-time voice input and recording
🔊 High-Precision Speech Recognition - Uses Whisper model for speech-to-text conversion
🤖 Intelligent Conversation Generation - Natural language processing based on oLLaMA models
💬 Real-time Conversation Interface - Smooth user interaction experience
🐳 Containerized Deployment - Supports Docker and Kubernetes environments
🔧 Scalable Architecture - Microservices design for easy maintenance and expansion

🛠️ Technology Stack

Frontend

Next.js - React framework
TypeScript - Type safety
MediaRecorder API - Browser audio recording

Backend Services

Flask - Whisper speech recognition service
Ollama - LLaMA3 conversation model service
Python - Backend logic processing

Deployment & Operations

Docker - Containerization technology
Kubernetes - Container orchestration
Google Kubernetes Engine (GKE) - Cloud deployment

🚀 Quick Start

Kubernetes Deployment

Deploy the complete system on a Kubernetes cluster:

1. export $(grep -v '^#' ./.env | xargs)
2. ./k8s_ops_script/gcp_deploy_init.sh
3. ./k8s_ops_script/gcp_deploy_build.sh
4. ./k8s_ops_script/gcp_deploy_deploy.sh

Local Development Environment

Start Whisper Service

cd whisper-service
pip install -r requirements.txt
python app.py

Start Ollama Service

# Install and start Ollama
ollama serve
ollama pull llama3

Start Frontend Application

cd app
npm install
npm run dev

📁 Project Structure

local-voice-ai/
├── app/                    # Next.js frontend application
├── whisper-service/        # Whisper speech recognition service
├── k8s/                   # Kubernetes deployment configuration
├── k8s_ops_script/        # Operations scripts
└── README.md              # Project documentation

🔧 Configuration

Environment Variables

WHISPER_MODEL: Whisper model version (tiny/base/small/medium/large)
OLLAMA_HOST: Ollama service address
PORT: Service port number

Kubernetes Resources

Deployments: Application service deployment
Services: Service exposure and load balancing
ConfigMaps: Configuration management
PersistentVolumes: Data persistence

🌟 Key Advantages

Fully Local - No dependence on external APIs, ensuring data privacy
Highly Integrated - Seamless integration of voice interaction components
Containerized Deployment - Easy to deploy and scale
Microservices Architecture - Modular design for easy maintenance
Multi-Environment Support - Supports both local development and production deployment

📝 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
BreezyVoice		BreezyVoice
app		app
whisper-service		whisper-service
.gitignore		.gitignore
AGENT_CONFIG_GUIDE.md		AGENT_CONFIG_GUIDE.md
README.md		README.md
STREAMING_TTS_GUIDE.md		STREAMING_TTS_GUIDE.md
docker-compose.model-svr.amd.yml		docker-compose.model-svr.amd.yml
docker-compose.model-svr.yml		docker-compose.model-svr.yml
docker-compose.yml		docker-compose.yml
makefile		makefile
myscript.desktop		myscript.desktop
nginx.conf		nginx.conf
pull_model-rq.sh		pull_model-rq.sh
quick_test.sh		quick_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local Voice AI Solution

📋 Project Overview

🏗️ System Architecture

🚀 Key Features

🛠️ Technology Stack

Frontend

Backend Services

Deployment & Operations

🚀 Quick Start

Kubernetes Deployment

Local Development Environment

📁 Project Structure

🔧 Configuration

Environment Variables

Kubernetes Resources

🌟 Key Advantages

📝 License

About

Uh oh!

Releases

Packages

Languages

Chunshan-Theta/local-voice-ai

Folders and files

Latest commit

History

Repository files navigation

Local Voice AI Solution

📋 Project Overview

🏗️ System Architecture

🚀 Key Features

🛠️ Technology Stack

Frontend

Backend Services

Deployment & Operations

🚀 Quick Start

Kubernetes Deployment

Local Development Environment

📁 Project Structure

🔧 Configuration

Environment Variables

Kubernetes Resources

🌟 Key Advantages

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages