A feature-rich WhatsApp bot that automatically transcribes voice messages using GPT-4o-transcribe and sends personalized AI-generated birthday wishes. Built with a plugin architecture and runs in Docker.
- Automatic transcription of received and sent voice messages
- GPT-4o-transcribe for high accuracy (better than Whisper)
- Native OGG/Opus support (no conversion needed)
- Race condition handling - multiple transcriptions processed correctly
- Per-chat controls - enable/disable globally or per chat
- Manual transcription with
!tcommand
- AI-generated messages using GPT-4o with high creativity (temperature 1.3)
- Maximum variety - every message is unique and creative
- Age-aware with correct ordinals (21st, 22nd, 31st, etc.)
- Smart scheduling - random delivery in a configurable 7-9 AM (timezone-aware) window with persistence across restarts
- Smart skipping - won't send if you already messaged
- Formal/informal modes - first name vs last name with titles
- Multi-language - English, German, and Italian support
- CSV-based - easy to manage birthdays
- No signatures - clean, natural messages
- See BIRTHDAY_PLUGIN.md for full documentation
- Automatic profile picture rotation driven by a movie file (MP4/MKV)
- Key-frame aware seeking so the bot always jumps to the next natural frame even when behind
- Interval based: advance one key frame every
PROFILE_MESSAGE_INTERVALreceived messages (see details below) - Optimized uploads: downscales/compresses frames to stay below 100 KB before sending to WhatsApp
- Progress tracking:
!cinema-progress(authorized users only) shows frame/index/timestamp without revealing the movie title - Persistent state: remembers progress and movie fingerprint across restarts and when the source file changes
- Authorization system - only specified numbers can use commands
- Friendly auto-reply for unauthorized users
- No public access - completely private
- Bot messages in English, German, and Italian
- Configurable via
BOT_LANGUAGEenvironment variable - Transcription auto-detects spoken language
- Plugin architecture - easy to add new features
- Clean separation of concerns
- Well-documented code
- Docker - runs in container
- Persistent storage - WhatsApp session and config
- QR code authentication - easy setup
- Auto-restart on crashes
The bot handles the critical race condition where:
- Long voice message starts transcribing
- Short voice message arrives 20 seconds later
- Short transcription finishes first
Solution: Each transcription replies to the original voice message (quoted), so even if transcriptions finish out of order, they're always correctly associated with their source message.
- Easy to add new plugins
- Each plugin can handle messages and commands
- Plugins register their commands dynamically for !help
- Docker and Docker Compose
- OpenAI API key
- WhatsApp account
-
Clone and navigate to project:
cd /path/to/whatsapp-transcription-bot -
Create environment file:
cp .env.example .env
-
Edit
.envand add your credentials:OPENAI_API_KEY=sk-your-actual-api-key AUTHORIZED_NUMBERS=+1234567890,+0987654321
-
Build and start the bot:
docker-compose up --build
-
Scan the QR code displayed in the console with WhatsApp
-
Done! The bot is now running.
All commands start with ! and are only available to authorized numbers.
!help- Show all available commands and plugins!chatid- Get the current chat ID!plugins- Open the Plugin Manager to enable/disable plugins
!t- Manually transcribe a quoted voice message (reply to a voice message with !t)!transcription- Show current transcription settings!transcription on- Enable automatic transcription for this chat!transcription off- Disable automatic transcription for this chat!transcription global on- Enable transcription globally for all chats!transcription global off- Disable transcription globally for all chats
!birthdays- Show numbered list of all birthdays!birthdays-reload- Reload the birthday CSV file!birthdays-test <number>- Generate and preview a test birthday message
!plugins- Show list of all plugins and their status!plugins <number>- Toggle a plugin by its number (e.g.!plugins 1)!plugins <numbers>- Toggle multiple plugins (e.g.!plugins 1, 3)!plugins on- Enable all plugins!plugins off- Disable all plugins
!cinema-progress- Show the current frame/timestamp/percentage (authorized senders only)
Profile Cinema Interval Basics:
- Every inbound (non-bot) message increments an internal counter.
- When the counter reaches
PROFILE_MESSAGE_INTERVAL, it resets and the bot jumps to the next key frame extracted viaffprobe. - Only one key frame is advanced per interval; if the bot was offline, it still resumes from the next queued frame without skipping.
- Example: with
PROFILE_MESSAGE_INTERVAL=10, the 10th, 20th, 30th⦠message will trigger a new frame.
See BIRTHDAY_PLUGIN.md for complete birthday wishes documentation.
When automatic transcription is enabled (globally or per-chat):
- β Received voice messages are automatically transcribed
- β Sent voice messages are automatically transcribed
- β Transcription is posted as a reply to the original voice message
- β Multiple transcriptions in parallel are handled correctly
Non-authorized users who send commands will receive a friendly message:
π Hello! Thanks for reaching out.
This is an automated bot that assists with various tasks.
However, access is currently restricted to authorized users only.
Have a great day!
whatsapp-bot/
βββ src/
β βββ index.js # Entry point
β βββ bot.js # Bot initialization
β βββ config.js # Configuration
β βββ middleware/
β β βββ auth.js # Authorization
β βββ commands/
β β βββ command-handler.js # Command routing & !help & !chatid
β βββ plugins/
β β βββ plugin-manager.js # Plugin system
β β βββ transcription/
β β β βββ index.js # Transcription plugin
β β β βββ transcription-service.js # GPT-4o integration
β β β βββ message-tracker.js # Race condition handler
β β βββ birthday/
β β βββ index.js # Birthday plugin
β β βββ birthday-scheduler.js # Scheduling & sending
β β βββ message-generator.js # GPT-4o message generation
β β βββ message-tracker.js # Daily message tracking
β β βββ csv-loader.js # CSV validation & loading
β βββ utils/
β βββ storage.js # Settings persistence
β βββ i18n.js # Internationalization (EN/DE)
βββ tests/ # Test suite
β βββ utils/ # Utility tests
β βββ middleware/ # Middleware tests
β βββ commands/ # Command tests
β βββ plugins/ # Plugin tests
β β βββ birthday/ # Birthday plugin tests
β β βββ transcription/ # Transcription plugin tests
β βββ fixtures/ # Test data
βββ data/ # Persisted data (auto-created)
β βββ .wwebjs_auth/ # WhatsApp session
β βββ .wwebjs_cache/ # WhatsApp cache
β βββ config.json # Bot settings
β βββ birthdays.csv # Birthday data (create from example)
β βββ birthdays.csv.example # Example birthday CSV
β βββ birthday-message-tracker.json # Tracks chats already messaged today
β βββ profile-movie-progress.json # State for Profile Cinema plugin
β βββ reminders.json # Persisted reminder list + schedule metadata
β βββ temp/ # Temporary audio files
βββ Dockerfile
βββ docker-compose.yml
βββ package.json
βββ .env.example # Environment variables template
βββ .gitignore
βββ .dockerignore
βββ LICENSE # MIT License
βββ README.md # Main documentation
βββ BIRTHDAY_PLUGIN.md # Birthday plugin documentation
βββ TESTING.md # Testing documentation
| Variable | Required | Description | Example |
|---|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key | sk-proj-... |
AUTHORIZED_NUMBERS |
Yes | Comma-separated phone numbers with country code | +1234567890,+0987654321 |
BOT_LANGUAGE |
No | Bot message language (en, de, or it) |
en (default) |
BOT_TIMEZONE |
No | IANA timezone identifier for all scheduling. If not set, uses system timezone | Europe/Zurich, America/New_York |
BIRTHDAY_CHECK_HOUR |
No | Hour (0-23) when the birthday CSV reload + scheduling runs | 6 |
BIRTHDAY_WINDOW_START |
No | Start hour for the birthday send window | 7 |
BIRTHDAY_WINDOW_END |
No | End hour for the birthday send window | 9 |
REMINDER_CHECK_HOUR |
No | Hour (0-23) when reminders are reloaded/scheduled | 6 |
REMINDER_SEND_HOUR |
No | Hour reminders fire (timezone-aware) | 7 |
REMINDER_SEND_MINUTE |
No | Minute reminders fire | 7 |
PROFILE_MOVIE_PATH |
No | Absolute/relative path to the MP4/MKV file used for Profile Cinema | ./data/profile.mp4 |
PROFILE_MESSAGE_INTERVAL |
No | Advance one key frame after this many received messages | 25 |
PROFILE_FRAME_MAX_KB |
No | Maximum allowed size for the generated JPEG (defaults to 100 KB) | 90 |
Both schedulers run with node-cron in the configured timezone, persist their queue state to data/, and reschedule anything missed during restarts so reminders and birthday wishes still go out even if the container was offline earlier.
The bot uses a single timezone setting (BOT_TIMEZONE) for all scheduling operations (birthdays and reminders). This ensures consistency across all time-based features.
Behavior:
- If
BOT_TIMEZONEis set in the environment, that timezone is used - If not set, the bot auto-detects the system timezone using
Intl.DateTimeFormat() - Falls back to
Europe/Berlinif timezone detection fails - The selected timezone is printed at startup along with its source
Example startup output:
π Timezone: Europe/Zurich
(configured via BOT_TIMEZONE environment variable)
or:
π Timezone: America/New_York
(auto-detected from system)
Why this matters: All date-based operations (birthday tracking, reminder scheduling, daily message tracking) use this timezone to determine "today" and when to trigger events. If you're in Europe/Zurich but the bot uses America/New_York, birthday wishes might arrive at unexpected times.
Phone numbers must include the country code with a + prefix:
- β
Correct:
+1234567890 - β Wrong:
1234567890or+1-234-567-890
The bot supports multiple languages for its messages and responses. Set the BOT_LANGUAGE environment variable:
en- English (default)de- German (Deutsch)it- Italian (Italiano)
All bot messages, error messages, help text, and command responses will be in the selected language. The transcription itself will auto-detect the spoken language in the audio.
Example:
BOT_LANGUAGE=deThe ./data directory is mounted as a volume and contains:
- WhatsApp authentication session (so you don't need to scan QR every time)
- Bot configuration and settings
- Temporary audio files during processing
The repository ships with .github/workflows/docker-publish.yml, which builds, tests, and pushes the container to Docker Hub whenever main changes or when triggered manually. To enable it:
- Create repository secrets
DOCKERHUB_USERNAMEandDOCKERHUB_TOKEN(a Docker access token). - Optionally adjust the workflow triggers or image name (
${{ secrets.DOCKERHUB_USERNAME }}/whatsapp-transcription-bot). - Pushing to
main(or running the workflow manually) will runnpm test, build the image with Buildx, and publish bothlatestandshatags.
docker-compose logs -fdocker-compose restartdocker-compose downIf you need to re-authenticate:
rm -rf data/.wwebjs_auth data/.wwebjs_cache
docker-compose restartnpm install
export OPENAI_API_KEY="sk-your-key"
export AUTHORIZED_NUMBERS="+1234567890"
npm startThe project includes comprehensive tests using Node.js built-in test runner.
Run all tests:
npm testRun tests in watch mode:
npm run test:watchRun tests with coverage:
npm run test:coverageSee TESTING.md for complete testing documentation.
- Create a new plugin file in
src/plugins/your-plugin/index.js - Implement the plugin interface:
export class YourPlugin { constructor() { this.name = 'Your Plugin Name'; this.description = 'Plugin description'; this.commands = [ { command: 'yourcommand', description: 'Command description' } ]; } shouldHandle(message) { // Return true if this plugin should handle the message return false; } async onMessage(message) { // Handle automatic message processing } async onCommand(command, args, message) { // Handle command execution } }
- Register in
src/bot.js:import { YourPlugin } from './plugins/your-plugin/index.js'; const yourPlugin = new YourPlugin(); pluginManager.register(yourPlugin);
- Model:
gpt-4o-transcribe(latest, best accuracy) - Native support for OGG/Opus format (WhatsApp's audio format)
- No audio conversion needed
- 25MB file size limit
- 5-minute timeout per transcription
The MessageTracker class ensures:
- Each transcription is tracked with a unique message ID
- Duplicate transcriptions are prevented
- Transcriptions are processed in parallel (no blocking)
- Each result is replied to its original message (quoted)
- Failed transcriptions are cleaned up properly
- Authorization middleware checks every command
- Non-privileged operations (transcription) work for authorized users only
- Container runs as root when volume is owned by root on host (for permission compatibility)
- No secrets in logs
Volume Permissions:
The container is designed to work with host-mounted volumes. If your host directories (e.g., /opt/data/whatsapp/data) are owned by root, the container will run as root to ensure proper file access. This is the recommended configuration for production deployments where the host filesystem is managed by root.
Directory Structure: The container automatically creates necessary subdirectories on startup via the entrypoint script:
/app/data/temp- Temporary audio files/app/data/.wwebjs_auth- WhatsApp session data/app/data/.wwebjs_cache- WhatsApp cache
Environment Variables:
Pass environment variables through docker-compose.yml or directly with docker run -e:
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- AUTHORIZED_NUMBERS=${AUTHORIZED_NUMBERS}
- BOT_LANGUAGE=${BOT_LANGUAGE:-en}
- BOT_TIMEZONE=${BOT_TIMEZONE:-}- Ensure
stdin_open: trueandtty: trueare set in docker-compose.yml - Check logs:
docker-compose logs -f
- Verify
OPENAI_API_KEYis valid - Check API quota/billing
- View detailed logs for error messages
- Ensure
AUTHORIZED_NUMBERSis set correctly with country code - Format:
+1234567890(no spaces, dashes, or parentheses)
- Check if bot is authenticated (look for "β WhatsApp Bot is ready!")
- Verify authorized number format
- Check command syntax (must start with
!)
If you see EACCES: permission denied, mkdir '/app/data/...' errors:
Cause: Mismatch between container user and host volume ownership.
Solution: The container runs as root by default to match root-owned host directories. If your host directories are owned by a different user:
-
Option 1 (Recommended): Run container as your host user ID:
services: whatsapp-bot: user: "${UID:-1000}:${GID:-1000}"
Then start with:
UID=$(id -u) GID=$(id -g) docker-compose up -
Option 2: Ensure host directory has proper permissions:
sudo chown -R 1000:1000 /path/to/data
-
Option 3: Use relaxed permissions (less secure):
chmod -R 777 /path/to/data
If you sent a message just after midnight (00:00-01:00) but the bot still sent birthday wishes:
Cause: The bot uses the configured BOT_TIMEZONE to determine "today". A message at 00:03 in your local time might be in a different day in the bot's timezone.
Solution: Ensure BOT_TIMEZONE matches your local timezone:
BOT_TIMEZONE=Europe/ZurichCheck the timezone at startup in the logs:
π Timezone: Europe/Zurich
(configured via BOT_TIMEZONE environment variable)
GPT-4o-transcribe pricing: $0.006 per minute of audio
Examples:
- 1 minute voice message: $0.006
- 10 minutes/day = ~$1.80/month
- 100 voice messages/day (avg 30 sec each) = ~$3/month
Very affordable for personal/small team use!
New Features:
- β
Plugin Manager: Enable/disable plugins directly from WhatsApp with
!plugins - β Interactive Menu: Text-based menu to view status and toggle plugins
- β Global Message Filter: Automatically ignores messages received before bot startup (prevents history flood)
- β Persistence: Plugin states are saved and restored across restarts
New Features:
- β
Timezone configuration via
BOT_TIMEZONEenvironment variable - β
Auto-detection of system timezone when
BOT_TIMEZONEnot set - β Timezone information displayed at startup with source
- β Docker entrypoint script ensures data directories exist
Improvements:
- Fixed container permission issues with root-owned volumes
- Removed non-root user requirement for better volume compatibility
- Clarified birthday message tracking logic
- Updated documentation with timezone and Docker deployment details
Configuration:
- Added
BOT_TIMEZONEto.env.exampleanddocker-compose.yml - Consistent timezone handling across all scheduler plugins
Features:
- β Voice message transcription with GPT-4o-transcribe
- β AI-generated birthday wishes with GPT-4o
- β
Birthday message testing with
!birthdays-test - β Multi-language support (EN/DE/IT)
- β Comprehensive test suite
- β Docker deployment
Improvements:
- Fixed
!chatidcommand to return correct chat ID (not sender ID) - Added numbered birthday list for easy testing
- Birthday wishes now use GPT-4o for unique, personalized messages
- Proper ordinal number handling (21st, 22nd, 31st, etc.)
- No dashes in generated birthday messages
- First/last name split for proper formal addressing
Testing:
- 50+ unit tests covering core functionality
- CSV validation tests
- Race condition handling tests
- Authorization tests
- Multi-language tests
MIT