A Next.js application enabling real-time voice conversations with Google's Gemini Multimodal Live API using WebRTC for low-latency audio streaming.
This application implements a real-time voice chat interface that:
- Captures audio from the user's microphone using the Web Audio API
- Streams audio to Google's Gemini Multimodal Live API via WebRTC
- Receives responses in real-time as audio and/or text
- Plays back AI-generated audio through the browser
- Next.js 14+ - React framework with App Router
- TypeScript - Type-safe development
- WebRTC - Real-time audio streaming
- Google Gemini Multimodal Live API - Multimodal AI conversation
- Web Audio API - Audio capture and playback
User Microphone
β
Web Audio API (capture)
β
WebRTC (peer connection)
β
Google Gemini Multimodal Live API
β
WebRTC (receive audio/data)
β
Web Audio API (playback)
β
User Speakers
- Node.js 18.x or higher
- npm or yarn or pnpm
- Google Cloud Account with Gemini API access
- Modern browser with WebRTC support (Chrome, Firefox, Edge, Safari)
- Clone the repository
git clone <repository-url>
cd <repository-name>- Install dependencies
npm install
# or
yarn install
# or
pnpm install- Set up environment variables
Create a .env.local file in the project root:
# Google Gemini API Configuration
GOOGLE_GEMINI_API_KEY=your_api_key_here
# Optional: API endpoint override
NEXT_PUBLIC_GEMINI_API_ENDPOINT=https://generativelanguage.googleapis.com
# Optional: Development mode settings
NEXT_PUBLIC_DEBUG_MODE=falseSee docs/environment-variables.md for detailed configuration options.
- Run the development server
npm run dev
# or
yarn dev
# or
pnpm dev- Open your browser
Navigate to http://localhost:3000
# Start development server
npm run dev
# Build for production
npm run build
# Start production server
npm start
# Run linting
npm run lint
# Run type checking
npm run type-check
# Run tests
npm test
# Run tests in watch mode
npm test:watchβββ app/ # Next.js App Router pages and layouts
β βββ api/ # API routes (server-side endpoints)
β βββ components/ # React components
β βββ page.tsx # Home page
βββ docs/ # Documentation
β βββ api-references.md
β βββ audio-codecs.md
β βββ environment-variables.md
β βββ troubleshooting.md
βββ lib/ # Utility functions and configurations
β βββ gemini/ # Gemini API integration
β βββ webrtc/ # WebRTC utilities
β βββ audio/ # Audio processing utilities
βββ public/ # Static assets
βββ types/ # TypeScript type definitions
βββ .env.local # Environment variables (not in git)
βββ next.config.js # Next.js configuration
βββ package.json # Dependencies and scripts
βββ tsconfig.json # TypeScript configuration
βββ CONTRIBUTING.md # Contribution guidelines
βββ README.md # This file
The development server supports hot module replacement (HMR). Changes to React components, API routes, and styles will automatically reload in the browser.
For debugging WebRTC connections:
- Open Chrome DevTools β
chrome://webrtc-internals/ - Monitor peer connections, ICE candidates, and media streams
- Check audio levels and codec information
Ensure all required environment variables are configured in your deployment environment. Never commit sensitive keys to version control.
- Connect your repository to Vercel
- Configure environment variables in the Vercel dashboard
- Deploy automatically on push to main branch
# Or deploy manually
npm run build
vercel deploy --prod- Netlify: Configure build command
npm run buildand publish directory.next - Docker: See
Dockerfilefor containerization - Self-hosted: Build the app and run with
npm startbehind a reverse proxy (nginx/Apache)
WebRTC requires HTTPS in production. Ensure your deployment platform provides SSL certificates. Most modern platforms (Vercel, Netlify, etc.) handle this automatically.
- Enable Next.js Image Optimization for static assets
- Configure CDN for faster content delivery
- Use edge functions for API routes when possible
- Implement connection pooling for API requests
- Monitor WebRTC connection quality and implement fallback mechanisms
If deploying frontend and backend separately, configure CORS headers appropriately for API routes.
- Monitor WebRTC connection quality metrics
- Log API errors and rate limiting issues
- Track audio latency and quality degradation
- Implement error boundaries for graceful failure handling
- API keys should never be exposed to the client
- Use environment variables for sensitive configuration
- Implement rate limiting on API routes
- Validate and sanitize all user inputs
- Use Content Security Policy (CSP) headers
- API References - Links to Next.js, WebRTC, and Gemini API docs
- Environment Variables - Complete environment configuration guide
- Audio Codecs - Supported codecs and audio format details
- Troubleshooting - Common issues and solutions
Please read CONTRIBUTING.md for details on our code of conduct, coding standards, and the process for submitting pull requests.
[Add your license here]
- Google Gemini AI Team for the Multimodal Live API
- Next.js team for the excellent framework
- WebRTC community for real-time communication standards