Advanced OCR Application for Bank Invoices and Consumer Receipts
ReceiptVision is a comprehensive Python-based OCR application that transforms receipts and invoices into structured data using advanced image processing and machine learning techniques. Built with a modern Apple-inspired UI and robust backend architecture.
- PDF Documents: Extract text and convert pages to images
- Image Formats: PNG, JPG, JPEG, BMP, TIFF support
- Automatic Detection: Smart file type recognition and processing
- Smart Data Extraction: Merchant names, dates, amounts, and itemized purchases
- Confidence Scoring: Field-level and overall processing confidence metrics
- Multi-Language Support: Configurable OCR language models
- Denoising: Multiple denoising algorithms for cleaner text extraction
- Adaptive Thresholding: Gaussian and mean adaptive thresholding
- Morphological Operations: Text cleanup and enhancement
- Skew Correction: Automatic image rotation and alignment
- Contrast Enhancement: CLAHE and custom enhancement algorithms
- Multiple File Upload: Process dozens of files simultaneously
- Progress Tracking: Real-time processing status and progress bars
- Job Management: Named batch jobs with detailed statistics
- Error Handling: Individual file error tracking and reporting
- Apple-Inspired Design: Clean, modern UI following Apple's design principles
- Responsive Layout: Works perfectly on desktop, tablet, and mobile
- Real-Time Updates: Live progress tracking and notifications
- Intuitive Navigation: Easy-to-use interface for all skill levels
- PostgreSQL Storage: Robust database with full ACID compliance
- Search & Filter: Advanced search capabilities across all receipts
- Data Export: Multiple export formats for extracted data
- Audit Trail: Complete processing history and metadata
- Python 3.8 or higher
- PostgreSQL 12 or higher
- Tesseract OCR engine
- Git
-
Clone the Repository
git clone https://github.com/encoreshao/ReceiptVision.git cd ReceiptVision -
Set Up Virtual Environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Install System Dependencies
macOS (using Homebrew):
brew install tesseract brew install poppler # For PDF processingUbuntu/Debian:
sudo apt-get update sudo apt-get install tesseract-ocr sudo apt-get install poppler-utils sudo apt-get install libpq-dev # For PostgreSQLWindows:
- Download and install Tesseract OCR
- Download and install Poppler
- Add both to your system PATH
-
Set Up PostgreSQL Database
# Create database createdb receiptvision # Create user (optional) psql -c "CREATE USER receiptvision_user WITH PASSWORD 'your_password';" psql -c "GRANT ALL PRIVILEGES ON DATABASE receiptvision TO receiptvision_user;"
-
Configure Environment Variables
cp env.example .env # Edit .env with your database credentials and settings -
Initialize Database
python migrations/init_db.py
-
Run the Application
python app.py
-
Access the Application Open your browser and navigate to
http://localhost:5001
- Navigate to Upload Page: Click "Upload" in the navigation menu
- Select File: Drag and drop or click to browse for your receipt/invoice
- Process: Click "Process File" to start OCR processing
- Review Results: View extracted data with confidence scores
- View Details: Click "View Full Details" for complete information
- Navigate to Batch Page: Click "Batch" in the navigation menu
- Add Files: Drag and drop multiple files or browse to select
- Name Your Job: Optionally provide a descriptive name for the batch
- Start Processing: Click "Start Batch Processing"
- Monitor Progress: Watch real-time progress updates
- Review Results: View completion statistics and individual file results
- View All Receipts: Navigate to the "Receipts" page
- Search & Filter: Use the search bar and filters to find specific receipts
- View Details: Click on any receipt to see full extracted data
- Export Data: Use export options to download data in various formats
ReceiptVision/
βββ app.py # Flask application factory
βββ models.py # SQLAlchemy database models
βββ api/
β βββ routes.py # Main API blueprint registration
β βββ blueprints/ # Resource-specific route blueprints
β βββ __init__.py # Blueprint package initialization
β βββ upload_routes.py # File upload endpoints
β βββ receipt_routes.py # Receipt management endpoints
β βββ batch_routes.py # Batch processing endpoints
β βββ system_routes.py # Health/statistics endpoints
β βββ utils.py # Shared API utilities
βββ web/
β βββ routes.py # Web interface routes
βββ services/
β βββ file_processor.py # File processing service
β βββ batch_processor.py # Batch processing service
βββ ocr/
β βββ ocr_engine.py # Main OCR processing engine
β βββ image_processor.py # Advanced image preprocessing
β βββ pdf_processor.py # PDF handling and conversion
βββ tests/
β βββ conftest.py # Pytest configuration
β βββ test_api.py # API endpoint tests
β βββ test_models.py # Database model tests
β βββ test_ocr.py # OCR processing tests
β βββ test_services.py # Service layer tests
βββ migrations/
βββ init_db.py # Database initialization
static/
βββ css/
β βββ style.css # Apple-inspired CSS styles
βββ js/
β βββ main.js # Core JavaScript functionality
templates/
βββ base.html # Base template with navigation
βββ index.html # Homepage with features showcase
βββ upload.html # Single file upload interface
βββ batch.html # Batch processing interface
βββ receipts.html # Receipt management interface
βββ receipt_detail.html # Individual receipt details
βββ statistics.html # Application statistics
- receipts: File metadata and processing status
- extracted_data: OCR results and structured data
- batch_jobs: Batch processing job tracking
The API is organized using Flask blueprints for better maintainability:
- π€ Upload Routes (
upload_routes.py): File upload and processing endpoints - π Receipt Routes (
receipt_routes.py): Receipt management and retrieval - π¦ Batch Routes (
batch_routes.py): Batch job management and status - βοΈ System Routes (
system_routes.py): Health checks and statistics - π§ Utils (
utils.py): Shared utilities and helper functions
Each blueprint is registered under the /api/v1 prefix and handles specific resource domains, making the codebase more modular and easier to maintain.
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgresql://... |
SECRET_KEY |
Flask secret key | dev-secret-key |
UPLOAD_FOLDER |
File upload directory | uploads |
MAX_CONTENT_LENGTH |
Maximum file size (bytes) | 16777216 (16MB) |
TESSERACT_CMD |
Tesseract executable path | /usr/local/bin/tesseract |
CORS_ORIGINS |
Allowed CORS origins | http://localhost:3000 |
The OCR engine can be configured for different languages and processing modes:
# In ocr/ocr_engine.py
custom_config = r'--oem 3 --psm 6 -l eng+fra+deu' # Multiple languagesFine-tune image processing in ocr/image_processor.py:
# Contrast enhancement
alpha = 1.2 # Contrast control (1.0-3.0)
beta = 10 # Brightness control (0-100)
# Denoising parameters
cv2.fastNlMeansDenoising(image, None, 10, 7, 21)# Run all tests
pytest tests/ -v
# Run specific test files
pytest tests/test_api.py -v
pytest tests/test_models.py -v
pytest tests/test_ocr.py -v
# Run with coverage report
pytest --cov=. --cov-report=html --cov-report=term
# Run tests in parallel (faster)
pytest -n autotest_api.py: API endpoint testingtest_models.py: Database model testingtest_ocr.py: OCR processing testingtest_services.py: Service layer testingconftest.py: Shared pytest fixtures and configuration
- Indexes on frequently queried columns
- Connection pooling for high-traffic scenarios
- Query optimization for large datasets
- Multi-threading for batch processing
- Image caching for repeated processing
- Memory-efficient processing for large files
- Response caching for static data
- Pagination for large result sets
- Asynchronous processing for long-running tasks
- File type validation and sanitization
- Size limits to prevent DoS attacks
- Temporary file cleanup after processing
- Parameterized queries to prevent SQL injection
- User input validation and sanitization
- Database connection encryption
- Rate limiting for API endpoints
- CORS configuration for cross-origin requests
- Input validation for all endpoints
The project includes production-ready Docker configuration:
-
Using Docker Compose
# Start all services docker-compose up -d # View logs docker-compose logs -f # Stop services docker-compose down
-
Services Configuration
- Web Application: Flask app with Gunicorn server on port 5000
- Database: PostgreSQL 13 with persistent data storage
- Reverse Proxy: Nginx for static files and SSL termination
- Health Checks: Built-in health monitoring for all services
-
Environment Variables Update the docker-compose.yml with your production settings:
environment: - DATABASE_URL=postgresql://your_user:your_pass@db:5432/receiptvision - SECRET_KEY=your-production-secret-key - FLASK_ENV=production
- AWS: EC2 + RDS + S3 for file storage
- Google Cloud: App Engine + Cloud SQL + Cloud Storage
- Azure: App Service + Azure Database + Blob Storage
- Heroku: Web dyno + Heroku Postgres + Cloudinary
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and add tests
- Run the test suite:
pytest - Submit a pull request
The project includes comprehensive development guidelines in .cursor/rules for:
- Architecture Patterns: Flask application factory, blueprint organization, service layer
- Code Standards: PEP 8 compliance, type hints, Google-style docstrings
- API Development: RESTful conventions, error handling, response formats
- Database Patterns: SQLAlchemy best practices, relationship management
- Testing Guidelines: pytest patterns, fixture usage, coverage requirements
- Security Considerations: Input validation, file upload security, SQL injection prevention
- Follow PEP 8 for Python code
- Use type hints for all function parameters and return values
- Add Google-style docstrings to all functions and classes
- Organize routes by resource using blueprints
- Implement business logic in service layer, not route handlers
- Write comprehensive tests for new features
- Use meaningful variable and function names
- Implement proper error handling and logging
# Upload single file
POST /api/v1/upload
Content-Type: multipart/form-data
file: [binary file data]
# Batch upload multiple files
POST /api/v1/batch-upload
Content-Type: multipart/form-data
files: [multiple binary files]
job_name: "Optional job name"# Get specific receipt
GET /api/v1/receipt/{receipt_id}
# List all receipts with pagination
GET /api/v1/receipts?page=1&per_page=10&status=completed
# Get detailed receipt information
GET /api/v1/receipts/{receipt_id}
# Download original receipt file
GET /api/v1/receipts/{receipt_id}/file# Get batch job status
GET /api/v1/batch-job/{job_id}
# List all batch jobs
GET /api/v1/batch-jobs?page=1&per_page=10# Health check
GET /api/v1/health
# Application statistics
GET /api/v1/statisticsAll API responses follow a consistent JSON structure:
{
"success": true,
"data": {...},
"message": "Operation completed successfully"
}Error responses:
{
"error": "Description of the error",
"code": "ERROR_CODE"
}Tesseract not found:
# macOS
brew install tesseract
export TESSERACT_CMD=/usr/local/bin/tesseract
# Ubuntu
sudo apt-get install tesseract-ocrPDF processing fails:
# Install poppler-utils
sudo apt-get install poppler-utils # Ubuntu
brew install poppler # macOSDatabase connection errors:
- Verify PostgreSQL is running
- Check database credentials in
.env - Ensure database exists and user has permissions
Low OCR accuracy:
- Ensure images are high resolution (300+ DPI)
- Check image quality and contrast
- Try different OCR language models
- Adjust image preprocessing parameters
This project is licensed under the MIT License - see the LICENSE file for details.
- Tesseract OCR for optical character recognition
- OpenCV for image processing capabilities
- Flask for the web framework
- PostgreSQL for robust data storage
- Apple Design Guidelines for UI inspiration
- Documentation: Wiki
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@receiptvision.com
Built with β€οΈ for accurate receipt processing