An end-to-end OCR pipeline for medical prescriptions, powered by a fine-tuned Donut (Document Understanding Transformer) model.
This project extracts text from handwritten/typed prescriptions and makes it available for further analysis (e.g., medicine recognition, dosage extraction).
- ๐ OCR with Donut โ Fine-tuned Donut model on prescription dataset
- ๐งฉ Pipeline Integration โ Easily plug into existing medical text-processing workflows
- ๐ Evaluation โ Tracks WER (Word Error Rate) and training loss
- ๐ง Customizable โ Extendable to structured parsing (medicine, dosage, duration)
- ๐พ Reusable Model โ Saved and reloadable with Hugging Face Transformers
Medical-Prescription-Analyzer-/
โโโ model_fine_tuning/ # Kaggle notebook for fine-tuning Donut
โ โโโ fine_tuning_model_gamma.ipynb
โโโ pipeline/ # Pipeline notebook for inference & analysis
โ โโโ prescription_pipeline.ipynb
โโโ README.md # Project overview
โโโ requirements.txt # Dependencies
โโโ LICENSE # MIT License
โโโ .gitignore # Ignored files
-
Clone the repo:
git clone https://github.com/SOHAM-3T/Medical-Prescription-Analyzer-.git cd Medical-Prescription-Analyzer- -
Install dependencies:
pip install -r requirements.txt
To fine-tune Donut on your own dataset:
from transformers import DonutProcessor, VisionEncoderDecoderModel
# Load pre-trained Donut
processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base")
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")
# Fine-tuning is implemented in model_fine_tuning/fine_tuning_model_gamma.ipynbExample usage in your pipeline:
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch
MODEL_DIR = "path/to/donut-finetuned-prescription-ocr"
# Load fine-tuned model
processor = DonutProcessor.from_pretrained(MODEL_DIR)
model = VisionEncoderDecoderModel.from_pretrained(MODEL_DIR).to("cuda" if torch.cuda.is_available() else "cpu")
# OCR function
def ocr_prescription(img_path: str):
image = Image.open(img_path).convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(model.device)
outputs = model.generate(pixel_values, max_length=512)
return processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(ocr_prescription("sample_prescription.jpg"))| Metric | Value (10 Epochs) |
|---|---|
| Training Loss | ~4.4 |
| Validation Loss | ~4.4 |
| WER | ~0.87 โ 1.15 (small dataset, unstable) |
Note: With a larger dataset and more training epochs, performance will improve significantly.
The fine-tuned model is available in:
- Kaggle notebook outputs:
soham3ripathy/fine-tuning-model-gamma - Can be uploaded as a Kaggle dataset or GitHub release
- Expand dataset for better generalization
- Add structured JSON parsing (medicine, dosage, duration)
- Deploy as an API (Flask/FastAPI)
- Improve model performance with more training data
- Add support for multiple languages
Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to improve.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Thanks to these amazing people who have contributed to this project:
|
Soham Tripathy |
Ajay Kumar Prasad |
Sai Prithvi |
- Naver Clova IX for the original Donut model
- Hugging Face for the Transformers library
- The medical community for inspiring this healthcare technology solution