A comprehensive test helper for the vowel.to conversational voice assistant, providing both Talker (TTS) and Listener (STT) capabilities with multi-language support.
NOTE: This app is best served from a mobile device when testing vowel enabled applications.
| Phrase Management | Language Configuration |
- Project Management: Organize phrases in folder-based projects
- Markdown Support: Load and navigate through markdown files with phrases
- Multi-language TTS: 7 languages with 90+ voices (see Language Support)
- Auto-translation: Translate text from source language to target language using Groq GPT-OSS-20B
- Keyboard Navigation: Navigate phrases with arrow keys, play with spacebar
- Real-time Playback: High-quality text-to-speech using Deepgram TTS
- Smart Caching: Automatic disk-based caching of TTS responses to save API credits and improve performance
- Voice Selection: Choose from available voices for each language
- Speed Control: Adjust playback speed from 0.25x to 2.0x with granular options
- File Upload: Upload markdown files directly through the UI
- Voice Activity Detection (VAD): Automatic speech detection and recording
- Multi-language STT: 7 languages matching TTS availability (see Language Support)
- Real-time Transcription: Live transcription with confidence scores
- Transcription Management: Sort, filter, search, and export transcriptions
- Adjustable Sensitivity: Customize VAD sensitivity for different environments
- Silence Detection: Automatic recording stop after silence period
- Max Recording Duration: Configurable maximum recording time
- State Management: Zustand stores for feature state
- API Integration: Deepgram (TTS & STT), Groq (Translation)
- Performance: TTS caching, optimized rendering
- TypeScript: Full TypeScript support throughout
- Hot Module Replacement: Fast development with HMR
This application provides intersected language support for both Text-to-Speech (TTS) and Speech-to-Text (STT), meaning only languages available in both capabilities are selectable. This ensures a seamless experience where you can both speak and listen in the same language.
| Language | Variants | TTS Voices | STT Support |
|---|---|---|---|
| English | US, UK, Australian, Irish, Filipino | 42 voices | ✅ |
| Spanish | Mexican, Peninsular, Colombian, Latin American | 16 voices | ✅ |
| German | Standard | 7 voices | ✅ |
| French | Standard | 2 voices | ✅ |
| Italian | Standard | 10 voices | ✅ |
| Japanese | Standard | 5 voices | ✅ |
| Dutch | Standard | 9 voices | ✅ |
-
TTS (Text-to-Speech): Deepgram Aura - High-quality neural voices
- 90+ voices across 7 languages
- Multiple variants per language (e.g., US/UK/Australian English)
- See Deepgram TTS Documentation for full voice listings
-
STT (Speech-to-Text): Deepgram Nova - Industry-leading transcription
- Supports all 7 TTS-matching languages
- See Deepgram STT Documentation for language support details
Note: While Deepgram STT supports 100+ languages, this application intentionally limits STT selection to the 7 languages that also have TTS support, ensuring users can both speak and listen in their chosen language.
- Frontend: React 18 + TypeScript + Vite + TanStack Router
- Backend: Cloudflare Workers + TypeScript
- Styling: Tailwind CSS + shadcn/ui
- State Management: Zustand
- AI Services:
- Deepgram - TTS & STT (Nova-2, Aura voices)
- Groq - Translation (GPT-OSS-20B)
- Groq - Alternative STT (Whisper)
- Node.js 18+ or Bun
- API Keys:
DEEPGRAM_API_KEY- Deepgram API key (for both TTS and STT)GROQ_API_KEY- Groq API key (for translation and alternative STT)
-
Clone and install dependencies:
bun install cd client && bun install cd ../server && bun install
-
Set up environment variables: Create a
.envfile in the root directory:DEEPGRAM_API_KEY=your_deepgram_api_key_here GROQ_API_KEY=your_groq_api_key_here CLIENT_URL=http://localhost:8080 PORT=9090 NODE_ENV=development
Note:
- Get your Deepgram API key from https://deepgram.com/
- Get your Groq API key from https://groq.com/
- Deepgram provides both TTS (Aura voices) and STT (Nova-2) services
- Groq provides translation via GPT-OSS-20B and alternative STT via Whisper
-
Start the development servers:
# Start both client and server bun run dev # Or start individually: bun run dev:server # Server on port 9090 bun run dev:client # Client on port 8080
- Select a Project: Choose from existing projects or create a new one
- Choose a File: Select a markdown file containing phrases
- Set Languages: Configure source and target languages
- Navigate & Play: Use arrow keys to navigate, spacebar to play phrases
Keyboard Shortcuts:
↑/↓- Navigate between phrasesSpace- Play/Stop current phrase
- Select Language: Choose target language for transcription
- Adjust Sensitivity: Set VAD sensitivity based on your environment
- Start Listening: Click "Start Listening" to enable voice detection
- Speak: Talk naturally - recording starts/stops automatically
- Manage Transcriptions: View, filter, sort, and export your transcriptions
/
├── client/ # React frontend
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── lib/ # Utilities and API client
│ │ ├── stores/ # Zustand state management
│ │ └── routes/ # TanStack Router routes
├── server/ # Cloudflare Workers backend
│ ├── src/
│ │ ├── handlers/ # API handlers
│ │ ├── services/ # Business logic & caching
│ │ └── routes/ # API routes
├── shared/ # Shared types and constants
└── sample-project/ # Example project with phrases
GET /api/projects- List all projectsGET /api/projects/:id- Get specific projectPOST /api/projects- Create new projectPOST /api/translate- Translate textPOST /api/tts- Text-to-speech synthesisPOST /api/stt- Speech-to-text transcriptionGET /api/transcriptions- List transcriptionsPOST /api/transcriptions- Add transcription
GET /api/tts/cache/stats- Get cache statistics (hits, misses, size, hit rate)DELETE /api/tts/cache- Clear TTS cache
- Update language constants in
shared/constants.ts - Add language mappings for display names
- Test with your API providers
Projects are simply folders containing markdown files. Each markdown file is parsed for phrases (headings and text lines).
Adjust sensitivity, silence duration, and max recording time in the Listener controls or modify defaults in shared/constants.ts.
-
Microphone Access Denied
- Ensure HTTPS or localhost
- Check browser permissions
- Try refreshing the page
-
TTS/STT Not Working
- Verify
DEEPGRAM_API_KEYis set correctly in.env - For translation issues, verify
GROQ_API_KEYis set correctly - Check network connectivity
- Ensure Deepgram service is available at https://deepgram.com/
- Ensure Groq service is available at https://groq.com/
- Verify API keys have sufficient credits
- Check server logs for detailed error messages
- Verify
-
TTS Cache Management
- Cache automatically stores TTS responses to disk in
server/tts-cache/directory - Cache persists across server restarts (survives deployments)
- Cache uses SHA-256 hashing based on text, language, voice, and speed parameters
- Monitor cache performance:
GET /api/tts/cache/stats - Clear cache if needed:
DELETE /api/tts/cache - Cache statistics include: hits, misses, size, and hit rate percentage
- Cache directory is automatically excluded from git via
.gitignore
- Cache automatically stores TTS responses to disk in
- Chrome/Edge: Full support
- Firefox: Full support
- Safari: Limited WebRTC support
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review API provider documentation