Skip to content

Medical prescription OCR powered by fine-tuned Donut model. Extract text from handwritten prescriptions using AI. Healthcare + Computer Vision.

License

Notifications You must be signed in to change notification settings

SOHAM-3T/Medical-Prescription-Analyzer-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฉบ Medical Prescription Analyzer

An end-to-end OCR pipeline for medical prescriptions, powered by a fine-tuned Donut (Document Understanding Transformer) model.
This project extracts text from handwritten/typed prescriptions and makes it available for further analysis (e.g., medicine recognition, dosage extraction).


๐Ÿš€ Features

  • ๐Ÿ“„ OCR with Donut โ€” Fine-tuned Donut model on prescription dataset
  • ๐Ÿงฉ Pipeline Integration โ€” Easily plug into existing medical text-processing workflows
  • ๐Ÿ“Š Evaluation โ€” Tracks WER (Word Error Rate) and training loss
  • ๐Ÿ”ง Customizable โ€” Extendable to structured parsing (medicine, dosage, duration)
  • ๐Ÿ’พ Reusable Model โ€” Saved and reloadable with Hugging Face Transformers

๐Ÿ“‚ Project Structure

Medical-Prescription-Analyzer-/
โ”œโ”€โ”€ model_fine_tuning/          # Kaggle notebook for fine-tuning Donut
โ”‚   โ””โ”€โ”€ fine_tuning_model_gamma.ipynb
โ”œโ”€โ”€ pipeline/                   # Pipeline notebook for inference & analysis
โ”‚   โ””โ”€โ”€ prescription_pipeline.ipynb
โ”œโ”€โ”€ README.md                   # Project overview
โ”œโ”€โ”€ requirements.txt            # Dependencies
โ”œโ”€โ”€ LICENSE                     # MIT License
โ””โ”€โ”€ .gitignore                  # Ignored files

โš™๏ธ Installation

  1. Clone the repo:

    git clone https://github.com/SOHAM-3T/Medical-Prescription-Analyzer-.git
    cd Medical-Prescription-Analyzer-
  2. Install dependencies:

    pip install -r requirements.txt

๐Ÿ‹๏ธ Fine-Tuning (Optional)

To fine-tune Donut on your own dataset:

from transformers import DonutProcessor, VisionEncoderDecoderModel

# Load pre-trained Donut
processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base")
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")

# Fine-tuning is implemented in model_fine_tuning/fine_tuning_model_gamma.ipynb

๐Ÿ” Inference Pipeline

Example usage in your pipeline:

from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

MODEL_DIR = "path/to/donut-finetuned-prescription-ocr"

# Load fine-tuned model
processor = DonutProcessor.from_pretrained(MODEL_DIR)
model = VisionEncoderDecoderModel.from_pretrained(MODEL_DIR).to("cuda" if torch.cuda.is_available() else "cpu")

# OCR function
def ocr_prescription(img_path: str):
    image = Image.open(img_path).convert("RGB")
    pixel_values = processor(image, return_tensors="pt").pixel_values.to(model.device)
    outputs = model.generate(pixel_values, max_length=512)
    return processor.batch_decode(outputs, skip_special_tokens=True)[0]

print(ocr_prescription("sample_prescription.jpg"))

๐Ÿ“Š Results

Metric Value (10 Epochs)
Training Loss ~4.4
Validation Loss ~4.4
WER ~0.87 โ†’ 1.15 (small dataset, unstable)

Note: With a larger dataset and more training epochs, performance will improve significantly.


๐Ÿ“ฆ Model Access

The fine-tuned model is available in:

  • Kaggle notebook outputs: soham3ripathy/fine-tuning-model-gamma
  • Can be uploaded as a Kaggle dataset or GitHub release

๐Ÿ”ฎ Future Work

  • Expand dataset for better generalization
  • Add structured JSON parsing (medicine, dosage, duration)
  • Deploy as an API (Flask/FastAPI)
  • Improve model performance with more training data
  • Add support for multiple languages

๐Ÿค Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to improve.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ‘ฅ Contributors

Thanks to these amazing people who have contributed to this project:

SOHAM-3T
Soham Tripathy

SOHAM-3T
Ajay Kumar Prasad

SOHAM-3T
Sai Prithvi


๐Ÿ™ Acknowledgments

  • Naver Clova IX for the original Donut model
  • Hugging Face for the Transformers library
  • The medical community for inspiring this healthcare technology solution

About

Medical prescription OCR powered by fine-tuned Donut model. Extract text from handwritten prescriptions using AI. Healthcare + Computer Vision.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published