A Python package for processing Beowulf text data from Heorot.dk.
The heorot.py module can parse the dual-language edition of Beowulf and render the complete text in several formats:
- as a single combined JSON file
- as a single combined CSV
- as separate .ASS (Advanced SubStation Alpha subtitle format) files, one file per fitt
The module follows the Heorot.dk line numbering and fitt numbering system.
Get on Python 3.13 and install poetry:
curl -sSL https://install.python-poetry.org | python3 -poetry install
pre-commit installTo process the Beowulf text and generate all output formats:
poetry run heorot
# or more directly:
poetry run python -m beodata.parse.heorotThis will:
- Fetch the HTML content from heorot.dk (if not already cached)
- Parse the dual-language text
- Generate JSON and CSV files in
tests/data/fitts/ - Create ASS subtitle files for each fitt in
tests/data/subtitles/
The script requires a blank ASS template file:
tests/data/blank.ass- Template file for ASS subtitle formatting
After running the script, you'll find:
In tests/data/fitts/:
maintext.json- Complete text data in JSON formatmaintext.csv- Complete text data in CSV formatmaintext.html- Cached HTML from heorot.dk
In tests/data/subtitles/:
fitt_0.assthroughfitt_43.ass- ASS subtitle files for each fitt (except fitt 24, which doesn't exist)
beodata/parse/heorot.py- Main parsing and processing logic for Heorot.DK HTML textbeodata/text/numbering.py- Fitt boundaries and text structure constantsbeodata/subtitle/constants.py- Subtitle generation constantstests/data/fitts/- Output directory for generated filestests/data/subtitles/- Output directory for ASS subtitle files
This work is copyright 2025 by Callie Tweney, and licensed under CC BY 4.0.
The Heorot source text is copyright Benjamin Slade 2002-2020.