Skip to content

DevloperHS/cto.new

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Real-Time Voice Chat with Google Gemini Multimodal Live API

A Next.js application enabling real-time voice conversations with Google's Gemini Multimodal Live API using WebRTC for low-latency audio streaming.

πŸ—οΈ Architecture Overview

This application implements a real-time voice chat interface that:

  1. Captures audio from the user's microphone using the Web Audio API
  2. Streams audio to Google's Gemini Multimodal Live API via WebRTC
  3. Receives responses in real-time as audio and/or text
  4. Plays back AI-generated audio through the browser

Technology Stack

  • Next.js 14+ - React framework with App Router
  • TypeScript - Type-safe development
  • WebRTC - Real-time audio streaming
  • Google Gemini Multimodal Live API - Multimodal AI conversation
  • Web Audio API - Audio capture and playback

Data Flow

User Microphone
      ↓
Web Audio API (capture)
      ↓
WebRTC (peer connection)
      ↓
Google Gemini Multimodal Live API
      ↓
WebRTC (receive audio/data)
      ↓
Web Audio API (playback)
      ↓
User Speakers

πŸš€ Getting Started

Prerequisites

  • Node.js 18.x or higher
  • npm or yarn or pnpm
  • Google Cloud Account with Gemini API access
  • Modern browser with WebRTC support (Chrome, Firefox, Edge, Safari)

Installation

  1. Clone the repository
git clone <repository-url>
cd <repository-name>
  1. Install dependencies
npm install
# or
yarn install
# or
pnpm install
  1. Set up environment variables

Create a .env.local file in the project root:

# Google Gemini API Configuration
GOOGLE_GEMINI_API_KEY=your_api_key_here

# Optional: API endpoint override
NEXT_PUBLIC_GEMINI_API_ENDPOINT=https://generativelanguage.googleapis.com

# Optional: Development mode settings
NEXT_PUBLIC_DEBUG_MODE=false

See docs/environment-variables.md for detailed configuration options.

  1. Run the development server
npm run dev
# or
yarn dev
# or
pnpm dev
  1. Open your browser

Navigate to http://localhost:3000

πŸ”§ Local Development Workflow

Development Commands

# Start development server
npm run dev

# Build for production
npm run build

# Start production server
npm start

# Run linting
npm run lint

# Run type checking
npm run type-check

# Run tests
npm test

# Run tests in watch mode
npm test:watch

Project Structure

β”œβ”€β”€ app/                    # Next.js App Router pages and layouts
β”‚   β”œβ”€β”€ api/               # API routes (server-side endpoints)
β”‚   β”œβ”€β”€ components/        # React components
β”‚   └── page.tsx           # Home page
β”œβ”€β”€ docs/                   # Documentation
β”‚   β”œβ”€β”€ api-references.md
β”‚   β”œβ”€β”€ audio-codecs.md
β”‚   β”œβ”€β”€ environment-variables.md
β”‚   └── troubleshooting.md
β”œβ”€β”€ lib/                    # Utility functions and configurations
β”‚   β”œβ”€β”€ gemini/            # Gemini API integration
β”‚   β”œβ”€β”€ webrtc/            # WebRTC utilities
β”‚   └── audio/             # Audio processing utilities
β”œβ”€β”€ public/                 # Static assets
β”œβ”€β”€ types/                  # TypeScript type definitions
β”œβ”€β”€ .env.local             # Environment variables (not in git)
β”œβ”€β”€ next.config.js         # Next.js configuration
β”œβ”€β”€ package.json           # Dependencies and scripts
β”œβ”€β”€ tsconfig.json          # TypeScript configuration
β”œβ”€β”€ CONTRIBUTING.md        # Contribution guidelines
└── README.md              # This file

Hot Reloading

The development server supports hot module replacement (HMR). Changes to React components, API routes, and styles will automatically reload in the browser.

Browser Developer Tools

For debugging WebRTC connections:

  1. Open Chrome DevTools β†’ chrome://webrtc-internals/
  2. Monitor peer connections, ICE candidates, and media streams
  3. Check audio levels and codec information

🌐 Deployment Considerations

Environment Variables

Ensure all required environment variables are configured in your deployment environment. Never commit sensitive keys to version control.

Platform-Specific Guides

Vercel

  1. Connect your repository to Vercel
  2. Configure environment variables in the Vercel dashboard
  3. Deploy automatically on push to main branch
# Or deploy manually
npm run build
vercel deploy --prod

Other Platforms

  • Netlify: Configure build command npm run build and publish directory .next
  • Docker: See Dockerfile for containerization
  • Self-hosted: Build the app and run with npm start behind a reverse proxy (nginx/Apache)

HTTPS Requirements

WebRTC requires HTTPS in production. Ensure your deployment platform provides SSL certificates. Most modern platforms (Vercel, Netlify, etc.) handle this automatically.

Performance Optimization

  • Enable Next.js Image Optimization for static assets
  • Configure CDN for faster content delivery
  • Use edge functions for API routes when possible
  • Implement connection pooling for API requests
  • Monitor WebRTC connection quality and implement fallback mechanisms

CORS Considerations

If deploying frontend and backend separately, configure CORS headers appropriately for API routes.

πŸ“Š Monitoring and Logging

  • Monitor WebRTC connection quality metrics
  • Log API errors and rate limiting issues
  • Track audio latency and quality degradation
  • Implement error boundaries for graceful failure handling

πŸ”’ Security

  • API keys should never be exposed to the client
  • Use environment variables for sensitive configuration
  • Implement rate limiting on API routes
  • Validate and sanitize all user inputs
  • Use Content Security Policy (CSP) headers

πŸ“š Additional Documentation

🀝 Contributing

Please read CONTRIBUTING.md for details on our code of conduct, coding standards, and the process for submitting pull requests.

πŸ“„ License

[Add your license here]

πŸ™ Acknowledgments

  • Google Gemini AI Team for the Multimodal Live API
  • Next.js team for the excellent framework
  • WebRTC community for real-time communication standards

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •