A local-first, privacy-focused speech documentation system for multilingual teams. It allows users to speak in their native language, automatically translates to the team's language, tracks tasks from voice reports, and securely attaches photos. Designed for teams where members speak different languages but need to collaborate seamlessly. No uploading of sensitive data to the cloud.
- Audio Recording PWA: Progressive Web App with real-time waveform visualization
- Speech-to-Text: OpenAI Whisper Small (multilingual) for general purpose dictation
- Automatic Classification: AI-based categorization of tasks, notes, and updates (identifying custom categories)
- Multilingual Support: Speak in ANY language, get documentation in your configured language
- GDPR Compliant: Complete on-premise deployment with AES-256 encryption
- Multi-Room Support: Handle multiple treatment rooms per practice
- Export Integration: Zip (CSV + photos) export for creating reports, like billing
┌─────────────────────────────────────────────────────────────┐
│ Frontend (PWA) │
│ React + TypeScript + TailwindCSS + Vite │
│ - Audio recording with MediaRecorder API │
│ - Real-time waveform visualization │
│ - Offline-capable via Service Worker │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Backend (API) │
│ FastAPI + SQLAlchemy + Whisper │
│ - Audio upload and encryption │
│ - Batch processing queue │
│ - Transcription with vocabulary boost │
│ - Rule-based classification │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Storage Layer │
│ SQLite + Encrypted File Storage │
│ - AES-256-GCM file encryption │
│ - 90-day retention policy │
│ - Secure deletion │
└─────────────────────────────────────────────────────────────┘
- Docker and Docker Compose
- Node.js 20+ (for development)
- Python 3.11+ (for development)
-
Clone the repository:
git clone <repository-url> cd VoxDocs
-
Configure environment:
cp .env.example .env # Edit .env with secure keys -
Start services:
docker-compose up -d
-
Access the application at
http://localhost:3000
-
Start development services:
docker-compose -f docker-compose.dev.yml up
-
Or run services locally:
Backend:
cd backend python -m venv venv source venv/bin/activate # or `venv\Scripts\activate` on Windows pip install -r requirements.txt uvicorn app.main:app --reload
Frontend:
cd frontend npm install npm run dev
VoxDocs/
├── backend/ # FastAPI Backend
│ ├── app/
│ │ ├── api/ # API endpoints
│ │ ├── core/ # Configuration, database
│ │ ├── models/ # SQLAlchemy models
│ │ ├── services/ # Business logic
│ │ └── main.py # Application entry
│ └── requirements.txt
├── frontend/ # React PWA
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom hooks
│ │ ├── pages/ # Page components
│ │ ├── services/ # API client
│ │ └── stores/ # State management
│ └── package.json
├── docker/ # Docker configuration
│ ├── Dockerfile.backend
│ ├── Dockerfile.frontend
│ └── nginx.conf
├── ml/ # ML training scripts
├── docs/ # Documentation
└── docker-compose.yml
POST /api/audio/upload- Upload audio recordingGET /api/audio/{uuid}- Get recording detailsGET /api/audio/- List recordingsDELETE /api/audio/{uuid}- Delete recording
GET /api/transcription/{uuid}- Get transcriptionPOST /api/transcription/{uuid}/process- Process recordingPOST /api/transcription/{uuid}/correct- Submit correction
GET /api/classification/{uuid}- Get classificationsPOST /api/classification/classify-text- Classify text
POST /api/export/generate- Generate export file
The system automatically classifies your voice notes into categories. These can be customized, but default to:
| Category | Description | Example |
|---|---|---|
| Task | Action items | "Order printer paper", "Call client X" |
| Update | Status reports | "Project A is 50% complete" |
| Note | General observations | "Meeting room needs cleaning" |
| Issue | Problems encountered | "Server X is down" |
- Encryption: AES-256-GCM for all audio files at rest
- On-Premise: No cloud dependencies, complete local deployment
- Data Retention: Automatic deletion after 90 days
- Secure Delete: Multi-pass overwrite for file deletion
- Audit Logging: All data access is logged
Key environment variables:
| Variable | Description | Default |
|---|---|---|
SECRET_KEY |
JWT/session encryption key | (required) |
MASTER_KEY |
File encryption master key | (required) |
WHISPER_MODEL |
Whisper model size | small |
WHISPER_DEVICE |
Processing device | cpu |
BATCH_PROCESSING_HOUR |
Nightly processing hour | 2 |
RETENTION_DAYS |
Data retention period | 90 |
- Audio processing: 30-60 seconds per minute of audio (CPU)
- Batch processing: Overnight for non-urgent recordings
- Immediate processing: Available for urgent cases
MIT License - see LICENSE for details
For support inquiries, please contact the development team.