True_inference_with_LayoutLMv2ForTokenClassification, add support for custom OCR #128

amtam0 · 2022-06-19T16:08:23Z

Hi @NielsRogge,
Thanks for these great tutorials.
This PR to add support for custom OCR alternatives to the built-in one used in LayoutLMv2Processor.
In real world use-cases, Image quality is not good and Tesseract is not the best tool to use, using other alternatives that have better text detectors can improve inference performance (and can be used for fine-tuning). I added modularity to be able to use Tesseract for the recognition part if needed.
Popular OCR tools alternatives added in this Notebook under Inference Chapter:

DOCTR
EASYOCR
Using your own Tesseract config

Let me know if there are some changes to make
Thanks

review-notebook-app · 2022-06-19T16:08:26Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

add support for custom OCR

dd37431

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

True_inference_with_LayoutLMv2ForTokenClassification, add support for custom OCR #128

True_inference_with_LayoutLMv2ForTokenClassification, add support for custom OCR #128

amtam0 commented Jun 19, 2022

review-notebook-app bot commented Jun 19, 2022

True_inference_with_LayoutLMv2ForTokenClassification, add support for custom OCR #128

Are you sure you want to change the base?

True_inference_with_LayoutLMv2ForTokenClassification, add support for custom OCR #128

Conversation

amtam0 commented Jun 19, 2022

review-notebook-app bot commented Jun 19, 2022