This project is dual-licensed. See the License section for details.
Speakr is your intelligent note-taking companion - a personal, self-hosted web application that transforms audio recordings into organized, searchable notes. Whether you're capturing meetings on your phone, recording lectures, or documenting interviews, Speakr automatically transcribes your audio with speaker identification, generates concise summaries, and lets you interact with your content through an AI chat interface. Keep all your notes and insights securely on your own server, accessible from any device.
Perfect for mobile note-taking: Start the app on your phone, hit record, and get back diarized transcriptions that identify who said what - turning any conversation into structured, searchable notes.
Record Meetings & Take Notes
- Secure Sharing System: Share your transcriptions publicly with customizable permissions - control whether to include summaries and notes, manage shared links, and revoke access anytime.
- Enhanced Recording & Note-Taking: Completely redesigned recording interface with real-time notepad during recording. Perfect for mobile note-taking with improved system audio capture and dual visualizers.
- Advanced Speaker Diarization: Automatically identify and label different speakers in your recordings with AI-powered speaker detection and saved speaker profiles for future sessions.
- Mobile-Optimized Experience: Seamless recording and note-taking on phones and tablets - start recording on your phone and get back diarized transcriptions with speaker identification.
- Automated File Processing: "Black hole" directory feature for automatic audio file processing - drag and drop recordings from your computer's audio devices for instant transcription.
- AMR File Support: Added support for AMR audio files commonly used by mobile devices and voice recorders.
- Transcript Editing: Edit and update transcriptions in both simple and ASR modes to perfect your notes.
- Enhanced Markdown Editor: Improved summary and notes editor with markdown support for better formatting and organization.
- Configurable Logging: Dynamic logging control for better debugging and troubleshooting.
- Clickable Timestamps: Jump to specific points in recordings by clicking timestamps in transcriptions for easy verification and speaker identification.
- Streaming Chat Responses: Interactive AI chat with real-time streaming responses for better user experience.
- Audio Upload: Upload audio files (MP3, WAV, M4A, AMR, and many other formats) via drag-and-drop or file selection.
- Automated File Processing: "Black hole" directory monitoring for automatic batch processing of audio files without manual uploads.
- Advanced Browser Recording: Record audio directly in your browser with multiple options:
- Microphone: Record your voice.
- System Audio: Capture audio from other applications (e.g., online meetings). Note: To record system audio, you must select to share a "tab" or "screen" in your browser's screen sharing prompt and ensure the "share audio" checkbox is enabled.
- Both: Reliably record both your microphone and system audio simultaneously.
- Features dual real-time audio visualizers.
- (Note: System audio capture requires a secure context (HTTPS) or specific browser configuration. See the Deployment Guide for details.)
- Transcription: Choose between a standard OpenAI-compatible API or a more advanced ASR Webservice.
- Speaker Diarization: (ASR/WhisperX only) Automatically detect and separate different speakers in the transcript.
- Intuitive Speaker Labeling: A new, more intuitive interface for identifying and labeling speakers.
- AI Summarization & Titling: Generates concise titles and summaries using configurable LLMs.
- Interactive Chat: Ask questions and interact with the transcription content using an AI model.
- Metadata Editing: Edit titles, participants, meeting dates, summaries, and notes.
- User Management: Secure user registration and login system with an admin dashboard for managing users.
- Customization: Users can set their own language preferences, custom summarization prompts, and professional context to improve AI results.
Speakr integrates with external APIs for transcription and text generation. Here are the exact endpoints each service must implement:
Standard Whisper API Method:
- Endpoint:
/audio/transcriptions - Method: POST
- Format: OpenAI Whisper API compatible
- Used for: Audio transcription
- When: When
USE_ASR_ENDPOINT=false(default)
Common providers using this method:
- OpenAI Whisper API (
https://api.openai.com/v1) - The original OpenAI service - OpenRouter (
https://openrouter.ai/api/v1) - Multi-provider API gateway - Local APIs (
http://localhost:1234/v1) - Self-hosted solutions like LM Studio, Ollama, or custom deployments - Other API providers - Any service implementing the OpenAI Whisper API format
ASR Webservice Method:
- Endpoint:
/asr - Method: POST
- Format: Custom ASR webservice format
- Used for: Audio transcription with speaker diarization support
- When: When
USE_ASR_ENDPOINT=true
Recommended provider:
- onerahmet/openai-whisper-asr-webservice - Docker image that provides the
/asrendpoint with WhisperX engine support for speaker diarization
Chat Completions API:
- Endpoint:
/chat/completions - Method: POST
- Format: OpenAI Chat Completions API compatible
- Used for:
- AI-generated titles and summaries
- Interactive chat with transcriptions
- Automatic speaker identification
- Summary reprocessing
- When: Always (for all text generation features)
Common providers using this method:
- OpenAI (
https://api.openai.com/v1) - GPT models - OpenRouter (
https://openrouter.ai/api/v1) - Access to multiple LLM providers - Local APIs (
http://localhost:1234/v1) - Self-hosted solutions like LM Studio, Ollama, vLLM, or custom deployments - Other API providers - Anthropic, Google, Azure OpenAI, or any service implementing the OpenAI Chat Completions format
Example API Base URLs:
- OpenAI:
https://api.openai.com/v1 - OpenRouter:
https://openrouter.ai/api/v1 - Local APIs:
http://localhost:1234/v1
For detailed deployment instructions, see the Deployment Guide
The recommended method is to use the pre-built Docker image, which is fast and simple.
You do not need to clone this repository for this method. You only need Docker installed.
-
Create the
docker-compose.ymlfile: Create a new file nameddocker-compose.ymland paste the following content into it:services: app: image: learnedmachine/speakr:latest container_name: speakr restart: unless-stopped ports: - "8899:8899" env_file: - .env volumes: - ./uploads:/data/uploads - ./instance:/data/instance
-
Create a Configuration (
.env) File: Your choice here depends on which transcription method you want to use. See the API Endpoint Requirements section above for details on what endpoints each service must implement. Create a new file named.envand paste one of the following templates into it.-
Option A: Standard Whisper API Method Uses the
/audio/transcriptionsendpoint. This is the simplest method and works with OpenAI, OpenRouter, local APIs, and other providers implementing the OpenAI Whisper API format.# --- Text Generation Model (uses /chat/completions endpoint) --- TEXT_MODEL_BASE_URL=https://openrouter.ai/api/v1 TEXT_MODEL_API_KEY=your_openrouter_api_key TEXT_MODEL_NAME=openai/gpt-4o-mini # --- Transcription Service (uses /audio/transcriptions endpoint) --- TRANSCRIPTION_BASE_URL=https://api.openai.com/v1 TRANSCRIPTION_API_KEY=your_openai_api_key WHISPER_MODEL=whisper-1 # --- Application Settings --- ALLOW_REGISTRATION=false SUMMARY_MAX_TOKENS=8000 CHAT_MAX_TOKENS=5000 # --- Admin User (created on first run) --- ADMIN_USERNAME=admin ADMIN_EMAIL=admin@example.com ADMIN_PASSWORD=changeme # --- Docker Settings (rarely need to be changed) --- SQLALCHEMY_DATABASE_URI=sqlite:////data/instance/transcriptions.db UPLOAD_FOLDER=/data/uploads
Now, edit the
.envfile with your API keys and settings. -
Option B: ASR Webservice Method (for Speaker Diarization) Uses the
/asrendpoint. This method requires a separate ASR webservice container but enables speaker identification. This has been tested with theonerahmet/openai-whisper-asr-webserviceimage. See the Deployment Guide for instructions on how to run the ASR service.# --- Text Generation Model (uses /chat/completions endpoint) --- TEXT_MODEL_BASE_URL=https://openrouter.ai/api/v1 TEXT_MODEL_API_KEY=your_openrouter_api_key TEXT_MODEL_NAME=openai/gpt-4o-mini # --- Transcription Service (uses /asr endpoint) --- USE_ASR_ENDPOINT=true ASR_BASE_URL=http://your_asr_host:9000 # URL of your running ASR webservice ASR_DIARIZE=true ASR_MIN_SPEAKERS=1 ASR_MAX_SPEAKERS=5 # --- Application Settings --- ALLOW_REGISTRATION=false SUMMARY_MAX_TOKENS=8000 CHAT_MAX_TOKENS=5000 # --- Admin User (created on first run) --- ADMIN_USERNAME=admin ADMIN_EMAIL=admin@example.com ADMIN_PASSWORD=changeme # --- Docker Settings (rarely need to be changed) --- SQLALCHEMY_DATABASE_URI=sqlite:////data/instance/transcriptions.db UPLOAD_FOLDER=/data/uploads
Now, edit the
.envfile with your ASR service URL and other settings.
-
-
Start the Application: In your terminal, in the same directory as your
docker-compose.ymland.envfiles, run:docker compose up -d
If you want to modify the code or build the Docker image yourself, clone the repository first.
- Clone the Repository:
git clone https://github.com/murtaza-nasir/speakr.git cd speakr - Create
docker-compose.ymland.envfiles: Copy the example files from the repository.Edit thecp docker-compose.example.yml docker-compose.yml # For standard API cp env.whisper.example .env # OR for ASR Webservice cp env.asr.example .env
.envfile with your settings. - Build and Start:
docker compose up -d --build
Speaker Diarization only works with the ASR method and requires the whisperx engine. Here is an example docker-compose.yml for running the ASR service itself. You would run this in addition to the Speakr app container.
services:
whisper-asr-webservice:
image: onerahmet/openai-whisper-asr-webservice:latest-gpu
container_name: whisper-asr-webservice
ports:
- "9000:9000"
environment:
- ASR_MODEL=distil-large-v3 # or large-v3, medium
- ASR_COMPUTE_TYPE=float16 # or int8, float32
- ASR_ENGINE=whisperx # REQUIRED for diarization
- HF_TOKEN=your_hugging_face_token # Optional
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
device_ids: ["0"]
restart: unless-stoppedA Note on Diarization Accuracy: For best results, it is often better to set the number of speakers slightly higher (e.g., by 1-2) than the actual number present. You can easily merge speakers later in the speaker identification modal.
Speakr has been tested with the recommended onerahmet/openai-whisper-asr-webservice image. Other ASR webservices might work but have not been tested.
Important for ASR Setup: Speaker diarization requires a Hugging Face token and accepting terms for gated models (pyannote). If you encounter issues, check the ASR container logs with docker logs whisper-asr-webservice for troubleshooting.
For complete ASR setup instructions, model configurations, troubleshooting, and logs analysis, see the Deployment Guide
- Register/Login: Access the application at
http://localhost:8899. The admin user is created from theADMIN_*variables in your.envfile on first launch. - Set Preferences (Recommended): Go to your Account page. Here you can:
- Set your preferred transcription and output languages.
- Define a custom summarization prompt to tailor summaries to your needs.
- Add your name, job title, and company to provide more context for AI chat interactions.
- View and manage your saved speakers.
- Upload or Record:
- Upload: Go to New Recording or drag-and-drop an audio file onto the page
- Record: Use the browser recording feature in the "New Recording" screen to record directly from your microphone (requires HTTPS or browser configuration - see Browser Recording Setup).
- System Audio Recording: To capture system audio (e.g., from a video conference), you must select to share a specific browser tab or your entire screen in the browser's sharing dialog. Make sure to also check the box to "Share tab audio" or "Share system audio". Recording audio from a single application window is often not supported.
- Automated Processing: Drop files into the monitored directory for automatic batch processing (see Automated File Processing below)
- The upload and processing status will appear in a popup.
- View and Interact:
- The main Gallery lists your recordings. Click one to view its details.
- Listen to the audio with the built-in player.
- Read the transcription and the AI-generated summary.
- Edit the title, participants, and other metadata.
- Chat with Transcript: Use the chat panel to ask questions about the recording.
- Speaker Diarization Workflow:
- To enable speaker diarization, you must use the ASR endpoint method and set
ASR_DIARIZE=truein your.envfile. - When a recording is processed with this option, speakers will be automatically detected and assigned generic labels (e.g.,
SPEAKER 00,SPEAKER 01). - After processing, click the Identify Speakers button on the transcription page.
- In the speaker identification modal, you can manually assign names to each speaker.
- Alternatively, use the Auto Identify button to let an AI model attempt to identify and name the speakers based on the conversation context.
- Saved speakers will be suggested for auto-completion in future sessions.
- You can also use the Reprocess button to re-transcribe the audio with different diarization settings if needed.
- To enable speaker diarization, you must use the ASR endpoint method and set
Speakr includes a powerful "black hole" directory monitoring feature that automatically processes audio files without manual uploads. This is perfect for batch processing scenarios where you want to drop files into a directory and have them automatically transcribed.
- File Monitoring: Speakr monitors a designated directory for new audio files
- Automatic Detection: When new audio files are detected, they are automatically queued for processing
- File Stability Check: Files are checked for stability (not being written to) before processing
- Automatic Processing: Files are moved to the uploads directory and processed using your configured transcription settings
- Database Integration: Processed recordings appear in your gallery with the title "Auto-processed - [filename]"
Add these environment variables to your .env file:
# Enable automated file processing
ENABLE_AUTO_PROCESSING=true
# Directory to monitor (inside container)
AUTO_PROCESS_WATCH_DIR=/data/auto-process
# How often to check for new files (seconds)
AUTO_PROCESS_CHECK_INTERVAL=30
# Processing mode (see modes below)
AUTO_PROCESS_MODE=admin_onlyUpdate your docker-compose.yml to mount the auto-process directory:
services:
app:
image: learnedmachine/speakr:latest
container_name: speakr
restart: unless-stopped
ports:
- "8899:8899"
env_file:
- .env
volumes:
- ./uploads:/data/uploads
- ./instance:/data/instance
- ./auto-process:/data/auto-process # Add this lineCreate the auto-process directory on your host system:
mkdir auto-processdocker compose down
docker compose up -dSpeakr supports three different auto-processing modes:
AUTO_PROCESS_MODE=admin_only- All files in the main auto-process directory are assigned to the admin user
- Simple setup, good for single-user scenarios
AUTO_PROCESS_MODE=user_directories- Create subdirectories for each user:
auto-process/user123/orauto-process/123/ - Files are automatically assigned to the corresponding user
- Example structure:
auto-process/ ├── user1/ # Files for user ID 1 ├── user5/ # Files for user ID 5 └── 123/ # Files for user ID 123
AUTO_PROCESS_MODE=single_user
AUTO_PROCESS_DEFAULT_USERNAME=john_doe- All files are assigned to a specific username
- Useful when you want all auto-processed files to go to a specific non-admin user
The auto-processor supports the same audio formats as manual uploads:
- Common formats: MP3, WAV, M4A, FLAC, AAC, OGG
- Mobile formats: AMR, 3GP, 3GPP
- Video formats: MP4, MOV, WEBM, WMA
Files in unsupported formats are automatically converted to WAV using ffmpeg.
# Copy files to the auto-process directory
cp /path/to/meeting1.mp3 auto-process/
cp /path/to/meeting2.wav auto-process/
# Files will be automatically processed within 30 seconds (default interval)# Create user directories
mkdir -p auto-process/user1
mkdir -p auto-process/user5
# Drop files for specific users
cp meeting_with_john.mp3 auto-process/user1/
cp team_standup.wav auto-process/user5/# Process multiple files at once
cp /recordings/*.mp3 auto-process/
cp /meetings/2024-01/*.wav auto-process/Check the container logs to monitor auto-processing activity:
docker logs speakr -fYou'll see log entries like:
File monitor started in 'admin_only' mode, watching: /data/auto-process
Found new audio file for user 1: /data/auto-process/meeting.mp3
Copied /data/auto-process/meeting.mp3 to /data/uploads/auto_20250119021530_meeting.mp3
Created recording record with ID: 123 for user: admin
Started background processing for recording ID: 123
- File Naming: Use descriptive filenames as they become part of the auto-generated title
- Batch Processing: You can drop multiple files at once; they'll be processed sequentially
- Network Shares: Mount network drives or NAS shares to the auto-process directory for remote file drops
- Monitoring: Set up log monitoring to track processing status and catch any errors
- Cleanup: Processed files are automatically moved and deleted from the auto-process directory
Files not being processed:
- Check that
ENABLE_AUTO_PROCESSING=truein your.envfile - Verify the auto-process directory is properly mounted
- Check container logs for error messages
Permission issues:
- Ensure the auto-process directory has proper read/write permissions
- The container runs as the user specified in the Docker configuration
Processing errors:
- Check that your transcription API is properly configured
- Verify ffmpeg is available in the container for file conversion
- Monitor container logs for specific error messages
This project is dual-licensed:
-
GNU Affero General Public License v3.0 (AGPLv3)
Speakr is offered under the AGPLv3 as its open-source license. You are free to use, modify, and distribute this software under the terms of the AGPLv3. A key condition of the AGPLv3 is that if you run a modified version on a network server and provide access to it for others, you must also make the source code of your modified version available to those users under the AGPLv3.
- You must create a file named
LICENSE(orCOPYING) in the root of your repository and paste the full text of the GNU AGPLv3 license into it. - Read the full license text carefully to understand your rights and obligations.
- You must create a file named
-
Commercial License
For users or organizations who cannot or do not wish to comply with the terms of the AGPLv3 (for example, if you want to integrate Speakr into a proprietary commercial product or service without being obligated to share your modifications under AGPLv3), a separate commercial license is available.
Please contact speakr maintainers for details on obtaining a commercial license.
You must choose one of these licenses under which to use, modify, or distribute this software. If you are using or distributing the software without a commercial license agreement, you must adhere to the terms of the AGPLv3.
Speakr is actively being developed. Future planned features include:
- Quick Language Switching: A faster way to change transcription or output languages on the fly.
- Large File Chunking: Automatic splitting of large audio files to support transcription endpoints with file size limits (like OpenAI's 25MB limit).
While direct code contributions are not the primary focus at this stage, feedback, bug reports, and feature suggestions are highly valuable! Please feel free to open an Issue on the GitHub repository.
Note on Future Contributions and CLAs: Should this project begin accepting code contributions from external developers in the future, signing a Contributor License Agreement (CLA) will be required before any pull requests can be merged. This policy ensures that the project maintainer receives the necessary rights to distribute all contributions under both the AGPLv3 and the commercial license options offered. Details on the CLA process will be provided if and when the project formally opens up to external code contributions.