Born from real field research frustration β automatically rename thousands of sample photos using AI.
I built this app specifically for the team at leadresearch.org who were collecting thousands of object sampl photos in Malawi and Kenya for lead contamination research. They were taking photos of each sample with handwritten labels like "MWI.1.2.15.7B.12.8" and needed a quicker way to rename the files to match the appropriate codes (even the photos with no visible code).
Before: Field researchers have thousands of photos named IMG_1234.jpg, IMG_1235.jpg, etc. Each one needs to be manually opened, the handwritten sample code read if it existed, and the file renamed β taking 30-60 seconds per photo. For 2,000 samples, that's 20-40 hours of mind-numbing work. If there was no code in the file, it needed to be manually associated with another object/image that did have a code.
After: Drop your camera folder into the app, grab coffee, come back to perfectly renamed files like MWI.1.2.15.7B.12.8.jpg. What used to take days now takes minutes.
- Drag your photo folder (or ZIP file) into the web app
- Google's AI reads each handwritten label and extracts the sample code
- Smart grouping finds photos without clear labels and groups them with similar ones
- Quick review interface lets you fix any mistakes with Excel-like keyboard shortcuts
- Export your perfectly organized collection
Node.js is like the "engine" that runs the app on your computer.
For Mac/Linux users:
# Copy and paste this into Terminal
curl -fsSL https://fnm.vercel.app/install | bash
fnm install 18
fnm use 18For Windows users:
- Go to nodejs.org
- Download the "LTS" version (the green button)
- Run the installer and click "Next" through everything
# Copy and paste this into Terminal (Mac/Linux) or Command Prompt (Windows)
git clone https://github.com/your-org/ocr-auto-label.git
cd ocr-auto-labelDon't have git? Click here to download as ZIP, then unzip it.
# This downloads all the code libraries the app needs
npm run install:allThe app uses Google's AI to read the handwritten codes. You need a free API key:
- Go to Google AI Studio
- Click "Get API Key" in the top right
- Click "Create API Key" β "Create API key in new project"
- Copy the long string that appears (starts with
AIza...)
Create a file called .env in the backend folder with your key:
Mac/Linux:
echo "GEMINI_API_KEY=YOUR_KEY_HERE" > backend/.envWindows (Command Prompt):
echo GEMINI_API_KEY=YOUR_KEY_HERE > backend\.envOr manually: Create a file called .env in the backend folder and put this inside:
GEMINI_API_KEY=YOUR_KEY_HERE
npm startπ That's it! The app will open in your browser at http://localhost:3000
- Drag and drop your photo folder or ZIP file into the web app
- Supported formats: JPEG, PNG, HEIC, ZIP archives (up to 5GB)
- Photos appear instantly in the table, sorted by when they were taken
Each photo has status indicators that show processing progress:
- π Extracting: AI is reading the handwritten code
- π Grouping: App is finding similar photos to group together
- β Complete: Ready for export
β οΈ Needs Attention: Couldn't read the code clearly
- Arrow keys to navigate like Excel
- Enter to edit the selected cell
- G to quickly edit the group name
- N to quickly edit the new filename
- F1 to see all keyboard shortcuts
- Click "Export" when you're happy with the results
- Choose "Download ZIP" to get a compressed file
- Or "Save to Folder" to create an organized folder on your computer
- Mac/Linux: Restart Terminal and try
node --version - Windows: Restart Command Prompt and try
node --version - If still not working, reinstall Node.js from nodejs.org
sudo chown -R $(whoami) ~/.npm# Install Python and Visual Studio Build Tools
npm install --global windows-build-tools- Check your API key in
backend/.env- it should start withAIza - Restart the app: Press
Ctrl+Cto stop, thennpm startagain - Clear the cache: Delete the
node_modulesfolders and runnpm run install:allagain
- Check your internet connection - the app needs internet to use Google's AI
- Verify your API key is working at Google AI Studio
- Check the console for error messages (press F12 in your browser)
- Wait a few minutes - Google limits how many requests you can make per minute
- Reduce batch size - process fewer photos at once
- Well-lit photos work best
- Avoid blurry images - the AI needs to read the handwriting clearly
- Straight angles help - try to avoid tilted or angled shots
- Start small - try 50-100 photos first to test your setup
- Group similar photos - photos taken at the same time/location work better
- Check periodically - review results every few hundred photos
- Arrow keys: Navigate table
- Enter: Edit selected cell
- Escape: Cancel editing
- Ctrl+A: Select all
- G: Edit group name
- N: Edit new filename
- Delete: Remove selected photos
Based on analysis of the app's implementation with Gemini 2.0 Flash:
Per Photo Breakdown:
- Input tokens: ~2,790 tokens (1,290 for image + 1,500 for comprehensive prompt)
- Output tokens: ~175 tokens (structured JSON response with code, colors, description)
- Cost per photo: ~$0.349 per 1,000 photos = $0.000349 per photo
Real-World Cost Examples:
- 100 photos: ~$0.035 (3.5 cents)
- 500 photos: ~$0.175 (17.5 cents)
- 1,000 photos: ~$0.35 (35 cents)
- 2,000 photos: ~$0.70 (70 cents)
- 5,000 photos: ~$1.75
Why It's So Affordable:
- Uses efficient Gemini 2.0 Flash (not the more expensive Pro model)
- Only processes each photo once - no retries unless you specifically request them
- Optimized prompt design minimizes token usage while maintaining accuracy
- $300 Google Cloud credit for new users covers ~860,000 photos
- No hidden costs - only pay for successful AI processing
- Transparent billing - see exact usage in Google Cloud Console
No Hidden Costs
- App is free - open source, no subscription fees
- Runs locally - no cloud storage or hosting fees
- One-time setup - no recurring payments
- Only pay Google for AI processing (and only when you use it)
- Processed locally on your computer
- Only tiny previews (100KB) sent to Google for AI processing
- Original photos never leave your computer
- No cloud storage - everything stays on your device
- Temporary files stored in your system's temp folder
- Automatically cleaned up when you restart your computer
- SQLite database keeps track of your work (stored locally)
- Check this README - most issues are covered above
- Look at the browser console - press F12 to see error messages
- Restart everything - close the app, run
npm startagain - Create an issue on GitHub with your error message
- Your operating system (Windows 10, macOS Monterey, etc.)
- Node.js version (run
node --version) - Error message (copy and paste the exact text)
- What you were doing when the error occurred
- Screenshots if there's a visual problem
- Upload: Your photos are copied to a secure temp folder
- Analysis: AI reads each photo and extracts the handwritten code
- Grouping: Smart algorithm finds similar photos based on colors, descriptions, and timing
- Review: You can fix any mistakes using the spreadsheet-like interface
- Export: Renamed photos are packaged for download
The app was specifically trained to recognize leadresearch.org's sample coding system:
- Malawi samples:
MWI.1.2.15.7B.12.8orMWI.0.1.4.10.15.7 - Kenya samples:
KEN.0.2.3.5.8.11 - Strict validation: Prevents false matches and catches common handwriting mistakes (like "D" vs "0")
- Uploads:
~/AppData/Local/Temp/ocr-auto-label/(Windows) or/tmp/ocr-auto-label/(Mac/Linux) - Database:
backend/prisma/dev.db(SQLite file) - Thumbnails: Auto-generated for fast preview
- Operating System: Windows 10, macOS 10.14, or Ubuntu 18.04+
- RAM: 4GB (8GB recommended for large batches)
- Storage: 2GB free space (plus space for your photos)
- Internet: Required for AI processing
- RAM: 8GB+ for processing 1000+ photos
- CPU: Multi-core processor for faster processing
- SSD: Faster file operations
- Stable internet: For reliable AI processing
ocr-auto-label/
βββ frontend/ # React + TypeScript + Vite
β βββ src/components/ # UI components
β βββ src/stores/ # Zustand state management
β βββ src/types/ # TypeScript definitions
βββ backend/ # Node.js + Express + Prisma
β βββ src/routes/ # API endpoints
β βββ src/services/ # Business logic
β βββ prisma/ # Database schema
βββ package.json # Workspace configuration
- Frontend: React 18, Vite, TypeScript, Zustand, Radix UI, Tailwind CSS
- Backend: Node.js, Express, Prisma, SQLite, Sharp (image processing)
- AI: Google Generative AI (Gemini 2.0 Flash)
- Deployment: Local development with production build support
POST /api/upload- Upload photos and ZIP filesGET /api/images- List all processed imagesPUT /api/images/:id- Update image metadataPOST /api/export- Generate export ZIPGET /api/gemini-updates- Server-sent events for real-time updates
MIT License - Free for personal and commercial use. No attribution required, but appreciated! π
Need help? Create an issue on GitHub or check the troubleshooting section above.