Metrical position in Greek hexameter.
See https://sasansom.github.io/sedes/ for web-based visualizations produced by this system.
See the tag tapa-version to reproduce the figures from the TAPA article, "Sedes as Style in Greek Hexameter: A Computational Approach."
See support here for the DHQ article, "SEDES: Metrical Position in Greek Hexameter."
See support here for the GRBS article, "Epic Rhythm: Metrical Shapes in Greek Hexameter."
See resources here for the CQ article, "Breaking Hermann's Bridge from Homer to Nonnus: Towards a Stylometry of Caesurae."
Sedes depends on The Classical Language Toolkit for lemmatization. You first need to install CLTK in a virtual environment:
python3 -m venv venv
source venv/bin/activate
pip3 install -U pip setuptools wheel
pip3 install cltk
Next, install the grc_models_cltk corpus:
python3 -c 'from cltk.data.fetch import FetchCorpus; FetchCorpus("grc").import_corpus("grc_models_cltk")'
The corpus is stored in a cltk_data subdirectory of your home directory.
The authors have used commit
94c04ac
of the grc_models_cltk corpus.
You only need to do the steps above once. Thereafter, every time you start a new shell, you need to run only the single command
source venv/bin/activate
The "src" subdirectory contains a tei2csv program
that processes a TEI-encoded XML document as downloaded from Perseus
and produces a CSV file that annotates every word with
its line number and sedes. For example:
src/tei2csv "Il." corpus/iliad.xml > corpus/iliad.csv
The expectancy program annotates one or more CSV files
as produced by tei2csv with statistics about expectancy for each word.
src/expectancy corpus/*.csv > expectancy.all.csv
The tei2html program produces an HTML representation of
a TEI-encoded XML document, with visual highlighting of word expectancy.
If you put the HTML file in the sedes-web directory,
it will have access to locally installed web fonts for Greek.
src/tei2html "Il." corpus/iliad.xml expectancy.all.csv > sedes-web/iliad.html
The join-expectancy program takes a work-specific CSV file (as
produced by tei2csv) and augments it with lemma/sedes expectancy
numbers.
src/join-expectancy corpus/iliad.csv expectancy.all.csv > iliad-expectancy.csv
The "src/hexameter" subdirectory contains a Python module that we use for metrical analysis. It is by Hope Ranker and comes from https://github.com/epilanthanomai/hexameter.
The "corpus" subdirectory contains selected TEI-encoded XML texts downloaded from
Perseus.
These are suitable for input to tei2csv and tei2html.
If you have GNU Make installed, you can analyze all the texts in the corpus using the command
make -j4
The above command will run tei2csv, expectancy, and tei2html to produce HTML visualizations in the sedes-web directory, as well as intermediary files.
If you do not have GNU Make, the script make.sh runs the
same commands as make would:
./make.sh
One of the main functions of the programs is to divide lines into "word" units.
By default, division into words is what you expect: separating on whitespace and punctuation.
You can control the definition of "word" using the --word-unit option
of the tei2csv and tei2html programs.
You must use the same value of the --word-unit both
when you invoke tei2csv to define the corpus
and when you invoke tei2html to generate a visualization.
No matter what value of the --word-unit option you use,
the column name in CSV output is still word.
The default is --word-unit word, in which words are delimited
straightforwardly by whitespace and punctuation.
src/tei2csv --word-unit word "Il." corpus/iliad.xml > corpus/iliad.csv
src/tei2html --word-unit word "Il." corpus/iliad.xml expectancy.csv > iliad.html
The line Il. 1.4, ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν, gets divided into 6 "words":
work |
book_n |
line_n |
word_n |
word |
lemma |
sedes |
metrical_shape |
|---|---|---|---|---|---|---|---|
| Il. | 1 | 4 | 1 | ἡρώων | ἥρως | 1 | ––– |
| Il. | 1 | 4 | 2 | αὐτοὺς | αὐτός | 4 | –– |
| Il. | 1 | 4 | 3 | δὲ | δέ | 6 | ⏑ |
| Il. | 1 | 4 | 4 | ἑλώρια | ἑλώριον | 6.5 | ⏑–⏑⏑ |
| Il. | 1 | 4 | 5 | τεῦχε | τεύχω | 9 | –⏑ |
| Il. | 1 | 4 | 6 | κύνεσσιν | κύων | 10.5 | ⏑–– |
The other option is --word-unit appositive-group.
The line is split on whitespace and punctuation as before,
but then consecutive words may be grouped together into a single word unit
according to whether they are considered appositive.
src/tei2csv --word-unit appositive-group "Il." corpus/iliad.xml > corpus/iliad.csv
src/tei2html --word-unit appositive-group "Il." corpus/iliad.xml expectancy.csv > iliad.html
With this option, Il. 1.4
is divided into 5 groups,
the adjacent words αὐτοὺς and δὲ
having been grouped together.
Notice also that the lemma column contains the lemmata of the words in the group
separated by spaces,
and the metrical_shape column has the concatenation of the constituent words.
work |
book_n |
line_n |
word_n |
word |
lemma |
sedes |
metrical_shape |
|---|---|---|---|---|---|---|---|
| Il. | 1 | 4 | 1 | ἡρώων | ἥρως | 1 | ––– |
| Il. | 1 | 4 | 2 | αὐτοὺς δὲ | αὐτός δέ | 4 | ––⏑ |
| Il. | 1 | 4 | 4 | ἑλώρια | ἑλώριον | 6.5 | ⏑–⏑⏑ |
| Il. | 1 | 4 | 5 | τεῦχε | τεύχω | 9 | –⏑ |
| Il. | 1 | 4 | 6 | κύνεσσιν | κύων | 10.5 | ⏑–– |
The definitions that control how words are joined into appositive groups
are the tables
ALWAYS_PREPOSITIVE_WORDS and ALWAYS_POSTPOSITIVE_WORDS in
src/appositive.py,
and the word-by-word override table in
src/exceptional-appositives.csv.
You can control what variables are used to group the input
for expectancy calculation using the
--by command-line option of the expectancy and tei2html programs.
By default the programs compute the sedes expectancy of each distinct lemma,
as if you had used --by sedes/lemma.
The syntax of the --by option is
dist_vars/cond_vars,
with "distribution variables" on the left of the slash and
"condition variables" on the right.
On either side of the slash, there may be zero or more variable names,
separated by commas.
To calculate expectancy, the programs first partition the input
into groups according to distinct values of the condition variables.
Then, they find the expectancy of each of the distinct values
of the distribution variables.
For example, by default, the programs first divide the input into groups, where each group represents one distinct lemma. Then, within each lemma group, they find what sedes are more and less expected.
To use variables other than the default,
you will have to run the component programs manually,
rather than using make.
This is an example of calculating sedes expectancy after first grouping by metrical shape, rather than lemma:
src/expectancy --by sedes/metrical_shape corpus/*.csv > expectancy.sedes-metrical_shape.csv
src/tei2html --by sedes/metrical_shape "Phaen." corpus/aratus.xml expectancy.sedes-metrical_shape.csv > aratus.sedes-metrical_shape.html
If you want a summary of the expectancy of a single variable across the entire input, without any grouping at all, you can do that too. For example, to count the number of occurrences of each unique word:
src/expectancy --by word/ corpus/*.csv
Or to calculate the overall sedes expectancy, without first grouping by lemma or any other variable:
src/expectancy --by sedes/ corpus/*.csv
The output of tei2csv is
CSV
that may be imported into a spreadsheet or further processed
by another program.
Greek text is represented as UTF-8 Unicode text. Characters are stored in decomposed form using Normalization Form D (NFD); this means that diacritics are separate combining characters. For example, the word ἀοιδή is stored as the sequence of characters
U+03B1 GREEK SMALL LETTER ALPHA
U+0313 COMBINING COMMA ABOVE
U+03BF GREEK SMALL LETTER OMICRON
U+03B9 GREEK SMALL LETTER IOTA
U+03B4 GREEK SMALL LETTER DELTA
U+03B7 GREEK SMALL LETTER ETA
U+0301 COMBINING ACUTE ACCENT
After UTF-8 encoding, this sequence is
\xce\xb1\xcc\x93\xce\xbf\xce\xb9\xce\xb4\xce\xb7\xcc\x81.
The characters that mark long and short metrical values
are respectively – U+2013 EN DASH
and ⏑ U+23D1 METRICAL BREVE.