Skip to content
/ ocrus Public

Simple Python script that does OCR on images for Russian text, and parses it for peripheral uses.

License

Notifications You must be signed in to change notification settings

4rim/ocrus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ocrus

Aka OC-R-US, aka (OCR)US aka OC(RUS).

Simple Python script that takes in images (screenshots of text) and uses Tesseract OCR to spit out recognizable text. The text is then parsed so we can use the corpus for several things, including but not limited to:

  • Creates word frequency list
  • Matches words with Anki flashcard files
  • Allows for quick dictionary look-ups

Currently this focuses on Russian, but it could be used for any language in the tessdata repo.

Future plans:

  • Probably use a better OCR service than Tesseract. I'm thinking of using this model, though that would require me to study up a bit on PyTorch - I don't mind!
  • Make script compatible with Windows/Mac/Linux filesystems - should be a minor fix.
  • (distant-er future) Add a GUI front-end that works with ShareX/other screenshot apps. Could do this with Qt, could do this with Gooey, which looks quite intuitive to use :)

About

Simple Python script that does OCR on images for Russian text, and parses it for peripheral uses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages