ocrus

Aka OC-R-US, aka (OCR)US aka OC(RUS).

Simple Python script that takes in images (screenshots of text) and uses Tesseract OCR to spit out recognizable text. The text is then parsed so we can use the corpus for several things, including but not limited to:

Creates word frequency list
Matches words with Anki flashcard files
Allows for quick dictionary look-ups

Currently this focuses on Russian, but it could be used for any language in the tessdata repo.

Future plans:

Probably use a better OCR service than Tesseract. I'm thinking of using this model, though that would require me to study up a bit on PyTorch - I don't mind!
Make script compatible with Windows/Mac/Linux filesystems - should be a minor fix.
(distant-er future) Add a GUI front-end that works with ShareX/other screenshot apps. Could do this with Qt, could do this with Gooey, which looks quite intuitive to use :)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
output.txt		output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ocrus

Future plans:

About

Uh oh!

Releases

Packages

Languages

License

4rim/ocrus

Folders and files

Latest commit

History

Repository files navigation

ocrus

Future plans:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages