VoxDocs - Speech-to-Text Documentation System

A local-first, privacy-focused speech documentation system for multilingual teams. It allows users to speak in their native language, automatically translates to the team's language, tracks tasks from voice reports, and securely attaches photos. Designed for teams where members speak different languages but need to collaborate seamlessly. No uploading of sensitive data to the cloud.

Features

Audio Recording PWA: Progressive Web App with real-time waveform visualization
Speech-to-Text: OpenAI Whisper Small (multilingual) for general purpose dictation
Automatic Classification: AI-based categorization of tasks, notes, and updates (identifying custom categories)
Multilingual Support: Speak in ANY language, get documentation in your configured language
GDPR Compliant: Complete on-premise deployment with AES-256 encryption
Multi-Room Support: Handle multiple treatment rooms per practice
Export Integration: Zip (CSV + photos) export for creating reports, like billing

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Frontend (PWA)                           │
│  React + TypeScript + TailwindCSS + Vite                   │
│  - Audio recording with MediaRecorder API                   │
│  - Real-time waveform visualization                         │
│  - Offline-capable via Service Worker                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Backend (API)                            │
│  FastAPI + SQLAlchemy + Whisper                            │
│  - Audio upload and encryption                              │
│  - Batch processing queue                                   │
│  - Transcription with vocabulary boost                      │
│  - Rule-based classification                                │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Storage Layer                            │
│  SQLite + Encrypted File Storage                           │
│  - AES-256-GCM file encryption                             │
│  - 90-day retention policy                                  │
│  - Secure deletion                                          │
└─────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

Docker and Docker Compose
Node.js 20+ (for development)
Python 3.11+ (for development)

Production Deployment

Clone the repository:
```
git clone <repository-url>
cd VoxDocs
```

Configure environment:

cp .env.example .env
# Edit .env with secure keys

Start services:
```
docker-compose up -d
```
Access the application at http://localhost:3000

Development Setup

Start development services:

docker-compose -f docker-compose.dev.yml up

Or run services locally:

Backend:

cd backend
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt
uvicorn app.main:app --reload

Frontend:

cd frontend
npm install
npm run dev

Project Structure

VoxDocs/
├── backend/                 # FastAPI Backend
│   ├── app/
│   │   ├── api/            # API endpoints
│   │   ├── core/           # Configuration, database
│   │   ├── models/         # SQLAlchemy models
│   │   ├── services/       # Business logic
│   │   └── main.py         # Application entry
│   └── requirements.txt
├── frontend/                # React PWA
│   ├── src/
│   │   ├── components/     # React components
│   │   ├── hooks/          # Custom hooks
│   │   ├── pages/          # Page components
│   │   ├── services/       # API client
│   │   └── stores/         # State management
│   └── package.json
├── docker/                  # Docker configuration
│   ├── Dockerfile.backend
│   ├── Dockerfile.frontend
│   └── nginx.conf
├── ml/                      # ML training scripts
├── docs/                    # Documentation
└── docker-compose.yml

API Endpoints

Audio

POST /api/audio/upload - Upload audio recording
GET /api/audio/{uuid} - Get recording details
GET /api/audio/ - List recordings
DELETE /api/audio/{uuid} - Delete recording

Transcription

GET /api/transcription/{uuid} - Get transcription
POST /api/transcription/{uuid}/process - Process recording
POST /api/transcription/{uuid}/correct - Submit correction

Classification

GET /api/classification/{uuid} - Get classifications
POST /api/classification/classify-text - Classify text

Export

POST /api/export/generate - Generate export file

Security & Privacy

Encryption: AES-256-GCM for all audio files at rest
On-Premise: No cloud dependencies, complete local deployment
Data Retention: Automatic deletion after 90 days
Secure Delete: Multi-pass overwrite for file deletion
Audit Logging: All data access is logged

Configuration

Key environment variables:

Variable	Description	Default
`SECRET_KEY`	JWT/session encryption key	(required)
`MASTER_KEY`	File encryption master key	(required)
`WHISPER_MODEL`	Whisper model size	`small`
`WHISPER_DEVICE`	Processing device	`cpu`
`BATCH_PROCESSING_HOUR`	Nightly processing hour	`2`
`RETENTION_DAYS`	Data retention period	`90`

Performance

Audio processing: 30-60 seconds per minute of audio (CPU)
Batch processing: Overnight for non-urgent recordings
Immediate processing: Available for urgent cases

License

MIT License - see LICENSE for details

Support

For support inquiries, please contact the development team.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
backend		backend
docker		docker
frontend		frontend
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README_SETUP.md		README_SETUP.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
platform_requirements.md		platform_requirements.md

Category	Description	Example
Task	Action items	"Order printer paper", "Call client X"
Update	Status reports	"Project A is 50% complete"
Note	General observations	"Meeting room needs cleaning"
Issue	Problems encountered	"Server X is down"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoxDocs - Speech-to-Text Documentation System

Features

Architecture

Quick Start

Prerequisites

Production Deployment

Development Setup

Project Structure

API Endpoints

Audio

Transcription

Classification

Export

Categories

Security & Privacy

Configuration

Performance

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoxDocs - Speech-to-Text Documentation System

Features

Architecture

Quick Start

Prerequisites

Production Deployment

Development Setup

Project Structure

API Endpoints

Audio

Transcription

Classification

Export

Categories

Security & Privacy

Configuration

Performance

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages