See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/360620085
OPTICAL CHARACTER RECOGNITION
Article · May 2022
CITATIONS                                                                                              READS
12                                                                                                     2,920
1 author:
            C K Gomathy
            Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya University
            249 PUBLICATIONS   842 CITATIONS   
              SEE PROFILE
Some of the authors of this publication are also working on these related projects:
             THE ASCENDENCY OF BIG DATA ON AUTOMATED AVIATION View project
             Travel Permit Indulgence Scheme View project
 All content following this page was uploaded by C K Gomathy on 16 May 2022.
 The user has requested enhancement of the downloaded file.
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE                                                                                        ISSN NO:1934-7197
                                   OPTICAL CHARACTER RECOGNITION
                                     Dr. V. Geetha1, Ch V V Sudheer2, A V Saikumar3
                                                             Dr C K Gomathy4
                      1, 4 Assistant Professor, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India
                             2, 3 Student, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India
             ABSTRACT:
             The project is about OCR. It stands for "Optical Character Recognition." OCR is an input
             device used to read a printed text. OCR scans the text optically, character by character,
             converts them into a machine readable code and stores the text on the system memory and
             convert it into Document. The technology of optical character recognition (OCR) was used to
             transform printed text into editable text. In a variety of applications, OCR is a very helpful
             and popular approach. Text preparation and segmentation techniques can influence OCR
             accuracy. Because of the image's varying size, style, orientation, and intricate backdrop,
             retrieving text from it might be challenging at times.It is a technology that recognizes text
             within a digital image. It is commonly used to recognize text in scanned documents and
             images. OCR can be used to convert a physical paper document, or an image into an
             accessible electronic version with text.
             Keywords: OCR, Tesseract, OpenCV, python.
             I. INTRODUCTION
             Text recognition is one of the most                               image or text given on a board sign or a
             prominent applications of computer vision                         paper. The Live Text feature recognizes
             which is being used by several                                    the text present in the image, be it a
             multinational Tech companies such as                              contact number or an email id. These
             Apple, Google, etc. Apple recently                                features work on a service or technology
             announced including the "Live Text"                               called     OCR       (Optical     Character
             feature in iOS15.This functionality is                            Recognition). For decades, OCR was the
             similar to how Google Lens works on                               sole means to transform printouts into
             Android phones and the Google Search                              computer-processable data, and it is still
             and Photos apps on iOS. So, the basic                             the preferred method for turning paper
             procedure of how these feature works are,                         invoices into extractable data that can be
             a person has to point the camera at an                            linked into financial systems, for example.
Volume 12, Issue 3, MARCH - 2022                                                                                   http://www.journaleca.com/   Page No: 211
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE                                                               ISSN NO:1934-7197
             However, electronic document submission         and symbols. Such machines appeared in
             now provides organizations with a               the middle of 1960s to early 1970s.For the
             significantly improved approach to areas        third generation of OCR systems, the
             like invoicing and sales processing,            challenges were documents of poor quality
             lowering costs and allowing employees to        and large printed and hand-written
             focus on higher-value activities.               character sets. Low cost and high
                                                             performance       were    also    important
             II. PROBLEM STATEMENT                           objectives. Commercial OCR systems with
                                                             such capabilities appeared during the
                                                             decade 1975 to 1985.The fourth generation
             The problem here is for the software
                                                             can be characterized by the OCR of
             systems to recognize characters in
                                                             complex documents intermixing with text,
             computer system when information is
                                                             graphics,     tables   and     mathematical
             scanned through paper documents as we
                                                             symbols,      unconstrained     handwritten
             know that we have number of newspapers
                                                             characters, color documents, low-quality
             and books which are in printed format
                                                             noisy documents, etc. Among the
             related to different subjects. Whenever we
                                                             commercial products, postal address
             scan the documents through the scanner,
                                                             readers, and reading aids for the blind are
             the documents are stored as images such as
                                                             available in the market.
             jpeg, gif etc., in the computer system.
             These images cannot be read or edited by        IV.EXISTING SYSTEM
             the user. But to reuse this information it is
             very difficult to read the individual           In the running world there is a growing
             contents and searching the contents form        demand for the users to convert the printed
             these documents line-by-line and word-by-       documents for maintaining the security of
             word. These days there is a huge demand         their data.Manually,it is time taken process
             in "storing the information available in        to note text in a image without a
             these paper documents in to a computer          software.So The basic OCR system was
             storage disk and then later editing or          invented to convert the data available on
             reusing this information by searching           papers and images            into computer
             process.                                        processed documents.
             III. LITERATURE SURVEY                          V. PROPOSED METHOD
             The first commercialized OCR of this            OCR is able to recognize the text in the
             generation was IBM 1418, which was              scanned documents and images into an
             designed to read a special IBM font407.         accessible electronic version with text and
             The recognition method was template             it will convert into documents and we are
             matching, which compares the character          doing it with real time example using
             image with a library of prototype images        webcam.So the characters in the images
             for each character of each font. Next           can be recognized. The OCR technology
             generation machines were able to                allows for us to search the text by words
             recognize regular machine-printed and           found within the document.
             hand printed characters. The character set
             was limited to numerals and a few letters
Volume 12, Issue 3, MARCH - 2022                                                          http://www.journaleca.com/   Page No: 212
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE                                                                     ISSN NO:1934-7197
                                                              manually rewritten from the form by an
                                                              experienced user contains less mistakes
                                                              than the data recognized by OCR/system.
             VI.ARCHITECTURE
                                                              VIII. REFERENCES
             Optical Character Recognition, or OCR, is        [1] DR.C.K.Gomathy , V.Geetha , S.Madhumitha
             a technology that enables us to convert              , S.Sangeetha , R.Vishnupriya Article: A
             different types of documents, such as                Secure With Efficient Data Transaction In
                                                                  Cloud Service, Published by International
             scanned paper documents, PDF files or                Journal of Advanced Research in Computer
             images captured by a digital camera or               Engineering & Technology (IJARCET)
             phone into editable and searchable data.             Volume 5 Issue 4, March 2016, ISSN: 2278 –
                                                                  1323.
                                                              [2] Dr.C.K.Gomathy,C K Hemalatha, Article: A
                                                                  Study On Employee Safety And Health
                                                                  Management International Research Journal
                                                                  Of Engineering And Technology (Irjet)-
                                                                  Volume: 08 Issue: 04 | Apr 2021
                                                              [3] Dr.C K Gomathy, Article: A Study on the
                                                                  Effect of Digital Literacy and information
                                                                  Management, IAETSD Journal For Advanced
                                                                  Research In Applied Sciences, Volume 7 Issue
                                                                  3,    P.No-51-57,    ISSN      NO:    2279-
                                                                  543X,Mar/2018
                                                              [4] Dr.C K Gomathy, Article: An Effective
                                                                  Innovation Technology In Enhancing Teaching
                                                                  And Learning Of Knowledge Using Ict
                                                                  Methods,      International     Journal     Of
                                                                  Contemporary Research In Computer Science
                                                                  And Technology (Ijcrcst) E-Issn: 2395-5325
                                                                  Volume3, Issue 4,P.No-10-13, April ’2017
                                                              [5] Dr.C K Gomathy, Article: Supply chain-
                                                                  Impact of importance and Technology in
                                   Fig 1: OCR Process             Software Release Management, International
                                                                  Journal of Scientific Research in Computer
                                                                  Science     Engineering     and    Information
                                                                  Technology ( IJSRCSEIT ) Volume 3 | Issue 6
             VII. CONCLUSION                                      | ISSN : 2456-3307, P.No:1-4, July-2018.
             Optical character recognition is a               [6] C K Gomathy and V Geetha. Article: A Real
                                                                  Time Analysis of Service based using Mobile
             necessary first step for all applications that       Phone Controlled Vehicle using DTMF for
             consider image as input Recognition of               Accident Prevention. International Journal of
                                                                  Computer Applications 138(2):11-13, March
             printed text gives good results. Almost all          2016. Published by Foundation of Computer
             the data read was correct. Only few                  Science (FCS), NY, USA,ISSN No: 0975-
             recognized fields contained mistakes, but            8887
                                                              [7] C K Gomathy and V Geetha. Article:
             they have been unreadable or damaged                 Evaluation on Ethernet based Passive Optical
             during the scanning process. Our                     Network Service Enhancement through
                                                                  Splitting of Architecture. International Journal
             evaluation shows that LBP with SVM
                                                                  of     Computer     Applications 138(2):14-17,
             gives optimal results with accuracy of               March 2016. Published by Foundation of
             96.5%. Our survey has shown that data                Computer Science (FCS), NY, USA, ISSN No:
                                                                  0975-8887
Volume 12, Issue 3, MARCH - 2022                                                                http://www.journaleca.com/   Page No: 213
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE                                           ISSN NO:1934-7197
                 [8] C.K.Gomathy and Dr.S.Rajalakshmi.(2014),
                     "A Software Design Pattern for Bank Service
                     Oriented Architecture", International Journal
                     of Advanced Research in Computer
                     Engineering and Technology(IJARCET),
                     Volume 3,Issue IV, April 2014,P.No:1302-
                     1306, ,ISSN:2278-1323.
                 [9] C. K. Gomathy and S. Rajalakshmi, "A
                     software quality metric performance of
                     professional management in service
                     oriented      architecture,"    Second
                     International Conference on Current
                     Trends in Engineering and Technology -
                     ICCTET 2014, 2014, pp. 41-47, doi:
                     10.1109/ICCTET.2014.6966260.
                  [10] Dr.C K Gomathy, V Geetha ,T N V Siddartha,
                 M Sandeep , B Srinivasa Srujay Article: Web
                 Service Composition In A Digitalized Health Care
                 Environment For Effective Communications,
                 Published by International Journal of Advanced
                 Research in Computer Engineering & Technology
                 (IJARCET) Volume 5 Issue 4, April 2016, ISSN:
                 2278 – 1323.
                 [11] Dr.C K Gomathy, V Geetha , T.Jayanthi,
                 M.Bhargavi, P.Sai Haritha Article: A Medical
                 Information Security Using Cryptosystem For
                 Wireless Sensor Networks, International Journal Of
                 Contemporary Research In Computer Science And
                 Technology (Ijcrcst) E-Issn: 2395-5325 Volume3,
                 Issue 4, P.No-1-5,April ’2017
                 [12] V Geetha , Dr.C K Gomathy T.Jayanthi, R.
                 Jayashree,, S. Indhumathi, E. Avinash,, Article:
                 An Efficient Prediction Of Medical Diseases Using
                 Pattern Mining In Data Exploration, International
                 Journal Of Contemporary Research In Computer
                 Science And Technology (Ijcrcst) E-Issn: 2395-
                 5325 Volume3, Issue 4,P.No-18-21,April ’2017
                 [13]V Geetha , Dr.C K Gomathy T.Jayanthi,
                 G.Vamsi , N.P.Ganesh,         G.Raheshwara Rao,
                 Article:     An Effective Implementation Of Data
                 Prefetching To Alleviate The Storage Access
                 Latency, International Journal Of Contemporary
                 Research In Computer Science And Technology
                 (Ijcrcst) E-Issn: 2395-5325 Volume3, Issue 4,P.No-
                 14-17.April ’2017
Volume 12, Issue 3, MARCH - 2022                                      http://www.journaleca.com/   Page No: 214
     View publication stats