0% found this document useful (0 votes)
97 views3 pages

Optical Character Recognition

Optical character recognition (OCR) is a process that converts scanned images of text documents into machine-readable text. The OCR process involves pre-processing the image, recognizing characters using pattern matching or feature extraction, and post-processing the text using spelling and grammar checks. OCR has applications in scanning documents like passports, bills, and business cards so they can be electronically edited. It is used across industries like customized OCR for business cards and invoices, and server-based OCR for large volumes of documents.

Uploaded by

Gopal Savaliya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views3 pages

Optical Character Recognition

Optical character recognition (OCR) is a process that converts scanned images of text documents into machine-readable text. The OCR process involves pre-processing the image, recognizing characters using pattern matching or feature extraction, and post-processing the text using spelling and grammar checks. OCR has applications in scanning documents like passports, bills, and business cards so they can be electronically edited. It is used across industries like customized OCR for business cards and invoices, and server-based OCR for large volumes of documents.

Uploaded by

Gopal Savaliya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Tutorial 5 Optical Character Recognition

CE406 - Computer Peripherals Workshop

Introduction
Optical character recognition is process of converting image capture in text
format into mechanical or electronic format.it is widely used for scan
passport documents,bank statements,business card,bills etc so that it is
electronically edited.OCR falls into the field of computer vision,Artificial
intelligence and pattern recognition.

Components
The architecture of OCR requires below components:
• Scanner
• OCR software/Hardware
• Output Interface

Recognition process
In the process of OCR involves basically three steps:

1. Pre-Processing -
• Optimizing the image for character recognition, this involves
Converting the image to grayscale image, for that Deskew,De-
speckle,Binarization
Normalization,and then after sampling.
• Denoising - removing noise in the image and other noise that may
have introducing in the scanning process.
• Thinning - Used in handwriting recognition, the strokes of the letters
are thinned to a width of one pixel to ease recognition
2.Character Recognition:
The primary stage of OCR are applied in this step.there are two methods.
• Matrix-Matching - Here each symbol is compared with a database of
matrices of characters so that whenever the matrix which is matched
the symbol of the closest ,which is chosen as the character and its
ASCII value is output.which is also known as pattern matching.
• Feature Extraction - This method is better than matrix matching
because a Feature Extraction is looked up for curves, closed loops
and general features of a character for recognition.

3.Post-Processing:
This step involves making spelling,context and grammar based corrections
on the output text so that the accuracy of output is increases.
• Manual Correction - Errors are remove by hand but sometimes
mistakes are left because of human error.
• Dictionary Based correction - Words are looked up in the dictionary
and then after automatically corrected.
• Context based correction - Advanced language models are applied to
understand and correct text.

Applications
Application of OCR basically divided by types of platform which is
running on so that it is divided into following types:
• Application oriented OCR
• Server based OCR
• Desktop based OCR

Increasing the popularity of OCR system,it is started to face variety of


problems with reference to original format of documents like that corrupted
images,paper skew,aggravated by framework and lines,some kind of
unique tests,etc
All of these are affected the OCR accuracy. To improve the recognition
accuracy they are connect various types of techniques which is related to
special images likes standard expressio,dictionary and rich data contained
in shading of pictures.this is called as Application oriented OCR or
Customized OCR.which is used scanned for business card,invoice
OCR,licence etc.

Server based OCR are used for bigger volume and large number of groups
of users.Here further process of scanned documents handled by server
based OCR software.we can get high accuracy by better feature and
functionality.

You might also like