Skip to content

fix: preserve element text content during MEI3→MEI5 conversion#25

Open
kyrieb-ekat wants to merge 1 commit into
mainfrom
fix/preserve-element-text
Open

fix: preserve element text content during MEI3→MEI5 conversion#25
kyrieb-ekat wants to merge 1 commit into
mainfrom
fix/preserve-element-text

Conversation

@kyrieb-ekat

Copy link
Copy Markdown

Summary

  • ET.Element(tag, attribs) copies tag and attributes but silently drops .text and .tail, so all element text content (including mei:l OCR text lines) was lost during conversion — elements were written as self-closing with no content
  • Root element was also constructed without its attributes, dropping meiversion and others
  • Output filename changed from *NEW2.mei to * - mei5.mei to match the existing MEI5 naming convention
  • liberbatch.py rewritten to process all three subdirectories and write output to the correct Liber Usualis - mei5/ directories (previously it only ran on the current directory and had a filename mismatch with the checker)

Test plan

  • Converted 0001_corr.mei and verified mei:l elements contain text (e.g. <mei:l ...>we lux odds</mei:l>)
  • Run python3 liberbatch.py from repo root to convert all ~2300 files
  • Index converted MEI5 files with MEIText2Solr.py and confirm search returns results

🤖 Generated with Claude Code

ET.Element() copies tag and attributes but not .text or .tail, so all
text content (including mei:l OCR lines) was silently dropped. Also fix
root element losing its attributes (meiversion etc.), update output
filename to match the existing mei5 naming convention, and rewrite
liberbatch.py to process all three subdirectories and write output to
the correct Liber Usualis - mei5/ directories.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant