Sorbische OCR-Trainingsdaten und Modelle

Sorbian OCR Training Data and Models

Serbske OCR-treningowe daty a modele

This repository provides recognition models for the Sorbian languages, supporting both Latin script and Fraktur script. It will provide training data in the near future.

Current Status

✅ Models for Tesseract are available
⏳ Models for Calamari and Kraken will be added soon

Upper Sorbian Tesseract Models

This repository provides Tesseract OCR models for Upper Sorbian, trained and fine-tuned at the Sorbian Institute.
Both Latin and Fraktur script variants are available.

📦 Models

hsb2.traineddata
Upper Sorbian (Latin script), v2
Fine-tuned from the official Latin.traineddata (tessdata).
Recommended for modern Sorbian texts (printed in Latin script).
hsb_frak2.traineddata
Upper Sorbian (Fraktur script), v2
Fine-tuned from the official Fraktur.traineddata (tessdata).
Recommended for historical Sorbian Fraktur prints.

🔧 Installation

System tessdata folder

Copy the models into your Tesseract tessdata directory.

Linux (Debian/Ubuntu):

sudo cp traineddata/*.traineddata /usr/share/tesseract-ocr/4.00/tessdata/

License & Origin

hsb2 + hsb_frak2 OCR Model © 2025 [Sorbian Institute] – licensed under CC BY 4.0.
Based on the official Tesseract tessdata models Latin and Fraktur,
© Google et al., licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
traineddata		traineddata
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sorbische OCR-Trainingsdaten und Modelle

Sorbian OCR Training Data and Models

Serbske OCR-treningowe daty a modele

Contents

Current Status

Upper Sorbian Tesseract Models

📦 Models

🔧 Installation

System tessdata folder

License & Origin

About

Uh oh!

Releases

Packages

License

Sorbisches-Institut/OCR

Folders and files

Latest commit

History

Repository files navigation

Sorbische OCR-Trainingsdaten und Modelle

Sorbian OCR Training Data and Models

Serbske OCR-treningowe daty a modele

Contents

Current Status

Upper Sorbian Tesseract Models

📦 Models

🔧 Installation

System tessdata folder

License & Origin

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages