DHQ: Digital Humanities Quarterly

2023 17.3

Categories in Digital Humanities

Editors: Dominik Gerstorfer, Evelyn Gius, and Janina Jacke

Front Matter

[en] Working on and with Categories for Text Analysis: Challenges and Findings from and for Digital Humanities Practices

Dominik Gerstorfer, Technical University of Darmstadt; Evelyn Gius, Technical University of Darmstadt; Janina Jacke, Kiel University

Abstract [en]

Articles

[en] Interlinking Text and Data with Semantic Annotation and Ontology Design Patterns to Analyse Historical Travelogues

Sandra Balck, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg; Ingo Frank, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg; Hermann Beyer-Thoma, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg; Anna Ananieva, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg

Abstract [en]

[en] Category Development at the Interface of Interpretive Pragmalinguistic Annotation and Machine Learning: Annotation, detection and classification of linguistic routines of discourse referencing in political debates

Michael Bender, Technical University of Darmstadt; Maria Becker, University of Heidelberg; Carina Kiemes, Technical University of Darmstadt; Marcus Müller, Technical University of Darmstadt

Abstract [en]

[en] Making the Whole Greater than the Sum of its Parts: Taxonomy development as a site of negotiation and compromise in an interdisciplinary software development project

Jennifer C. Edmond, Trinity College Dublin; Alejandro Benito Santos, University of Salamanca; Michelle Doran, Digital Repository of Ireland; Roberto Therón, University of Salamanca; Michał Kozak, Backbase; Cezary Mazurek, Institute of Bioorganic Chemistry of the Polish Academy of Sciences; Eveline Wandl-Vogt, Austrian Academy of Sciences; Aleyda Rocha Sepulveda, Austrian Academy of Sciences

Abstract [en]

[en] Categorising Legal Records – Deductive, Pragmatic, and Computational Strategies

Marlene Ernst, University of Passau; Sebastian Gassner, University of Passau; Markus Gerstmeier, University of Passau; Malte Rehbein, University of Passau

Abstract [en]

[en] Made to Be a Woman: A case study on the categorization of gender using an individuation-based approach in the analysis of literary texts

Mareike Schumacher, University of Regensburg; Marie Flüh, Hamburg University

Abstract [en]

[en] Categorial Relations in (Re)constructing Topoi and in (Re)modeling Topology as a Methodology: Vertical, horizontal, heuristic and epistemological interdependencies

Maria Hinzmann, University of Trier

Abstract [en]

[en] Systems of Intertextuality: Towards a formalization of text relations for manual annotation and automated reasoning

Jan Horstmann, University of Münster; Christian Lück, University of Münster; Immanuel Normann, University of Münster

Abstract [en]

[en] Are Ontologies Trees or Lattices?

C. M. Sperberg-McQueen, Black Mesa Technologies LLC; Claus Huitfeldt, University of Bergen

Abstract [en]

[en] Visualization of Categorization: How to see the wood and the trees

Ophir Münz-Manor, The Open University of Israel; Itay Marienberg-Milikowsky, Ben-Gurion University of the Negev

Abstract [en] In the article, we present, theorize and contextualize an investigation of figurative language in a corpus of Hebrew liturgical poetry from late antiquity, from both a manual and a computational point of view. The study touches upon questions of distribution and patterns of usage of figures of speech as well as their literary-historical meanings. Focusing on figures of speech such as metaphors and similes, the corpus was first annotated manually with markers on papers, and a few years later it was annotated manually again, this time in a computer-assisted way, following a strictly categorized approach, using CATMA (an online literary annotation tool). The data was then transferred into ViS-À-ViS (an online visualization tool, developed by Münz-Manor and his team) that enables scholars to “see the wood” via various visualizations that single out, inter alia, repetitive patterns either at the level of the text or the annotations. The tool also enables one to visualize aggregated results concerning more than one text, allowing one to “zoom out” and see the “forest aspect” of the entire corpus or parts thereof. Interestingly, after visualizing the material in this way, it often turns out that the categories themselves need to be re-assessed. In other words, the categorization and visualization in themselves create a sort of hermeneutical circle in which both parts influence one another reciprocally. Through the case study, we seek to demonstrate that, by using correct methods and tools (not only ViS-À-ViS but others also), one can ultimately use visualization of categorization as the basis for what might be called established speculation, or not-trivial generalization, which means, an interpretative act that tries to be based on clear findings, while at the same time enjoying the advantages of “over interpretation”. This approach, we argue, enables one to see the trees without losing sight of the wood, and vice versa; or “to give definition” – at least tentatively – “to the microcosms and macrocosms which describe the world around us”, be they factual or fictional.

[en] Annotating German in Austria: A Case-study of manual annotation in and for digital variationist linguistics

Markus Pluschkovits, University of Vienna

Abstract [en]

[en] From Semi-structured Text to Tangible Categories: Analysing and annotating death lists in 18th century newspaper issues

Claudia Resch, Austrian Academy of Sciences; Nina C. Rastinger, Austrian Academy of Sciences; Thomas Kirchmair, Austrian Academy of Sciences

Abstract [en] Annotating – understood here as the process in which segments of a text are marked as belonging to a defined category – can be seen as a key technique in many disciplines , especially for working with text in the Humanities [e.g. Unsworth 2000], the Computational Sciences (e.g. ; ), and the Digital Humanities . In the field of Digital Humanities, annotations of text are utilized, among other purposes, for the enrichment of a corpus or digital edition with (linguistic) information (e.g. ; ), for close and distant reading methods (e.g. ), or for machine learning techniques (e.g. ). Defining categories to shape data has been used in different text analysis contexts, including the study of toponyms (e.g. ) and biographical data (e.g. ). The paper at hand showcases the use of annotations within the Vienna Time Machine project (2020-2022, PI: Claudia Resch) which aims to connect different knowledge resources about historical Vienna via Named Entity Recognition (NER). More specifically, it discusses the challenges and potentials of annotating 18th century death lists found in the Wien[n]erisches Diarium or Wiener Zeitung, an early modern newspaper which was first published in 1703 and has already been (partly) digitized in form of the so-called DIGITARIUM : Here, users can access over 330 high-quality full text issues of the newspaper which contain a number of different text types, including articles, advertisements and more structured texts, such as arrival or death lists. The focus of this article lies on the semi-structured death lists, which do not only appear in almost every issue of the historical Wiener Zeitung, but are also relatively consistent in their structure and display a high semantic density: Each entry contains detailed information about a deceased person, such as their name, occupation, place of death, and age. Annotating these semi-structured list items opens up multiple possibilities: The resulting classified data can be used for efficient distant or scalable reading, quantitative analyses , and as a gold standard for both rule-based and machine learning NER approaches (e.g. ). To reach this goal and as a first step of the annotation process, the project team conducted a close reading of various death lists from multiple decades to identify recurrent linguistic patterns and, based hereon, to develop a first expandable set of categories. This bottom-up approach resulted in five preliminary categories, namely person, occupation, place, cause-of-death, and age, which were color-coded and, accompanied by annotated examples, documented in the form of annotation guidelines as intersubjectively applicable and concise as possible. These guidelines were then used by two researchers familiar with the historic material to annotate a randomly drawn and temporally distributed sample of 500 death list entries in the browser-based environment Prodigy (https://prodi.gy). Hereby, the emphasis was put especially on emerging “challenging” cases, i.e. items where annotators were in doubt about their choice of category, the exact positioning of annotations or the necessity to annotate certain text segments at all. Whenever annotators encountered such ambiguous items, these were collected, grouped and – as a third step in the annotation process – discussed with an interdisciplinary group of linguists, historians and prosopographers. Within this collective, a solution for each group of issues was agreed on and incorporated into the annotation guidelines. Also, existing categories were revised where necessary. The new, more stable category system was then again used for a new sequence of annotation and discussion of ambiguities, resulting in an iterative process where annotation and category development became intertwined. This approach, explained in the article in more detail, demonstrates that tagsets are never entirely final, but always depend on particular knowledge interests and data material and that even the annotation of inherently semi-structured lists requires continuous critical reflection and considerable historical and linguistic knowledge. At the same time, it can be exemplified by this work that it is precisely these “challenging” cases which carry a great potential for gaining knowledge and can be considered central to the development of a valid annotation system (cf. ).

[en] Developing Computational Models for Formalizing Concepts in the British Colonial India Corpus

Shanmugapriya T, University of Toronto Scarborough

Abstract [en]

[en] Case Study: Annotating the ambiguous modality of "must" in Jane Austen’s Emma

Angelika Zirker, University of Tübingen; Michael Göggelmann, University of Tübingen / University of Cologne

Abstract [en]

Articles

[en] Automated Transcription of Gə'əz Manuscripts Using Deep Learning

Samuel Grieggs, University of Notre Dame; Jessica Lockhart, University of Toronto; Alexandra Atiya, University of Toronto; Gelila Tilahun, University of Toronto; Suzanne Akbari, Institute for Advanced Study, Princeton, NJ; Eyob Derillo, SOAS, University of London; Jarod Jacobs, Warner Pacific College; Christine Kwon, University of Notre Dame; Michael Gervers, University of Toronto; Steve Delamarter, George Fox University; Alexandra Gillespie, University of Toronto; Walter Scheirer, University of Notre Dame

Abstract [en][en] ጥልቅ እውቀትን ለረቂቅ ጽሁፎች ስለመጠቀም ሳሙኤል ግሪግስ፡ ኖተርዳም ዩኒቨርሲቲ፤ ጀሲካ ሎክሀርት፡ቶሮንቶ ዩኒቨርሲቲ፤ አሌክሳንደራ አትያ፡ ቶሮንቶ ዩኒቨርሲቲ፤ ገሊላ ጥላሁን፡ ቶሮንቶ ዩኒቨርሲቲ፤ ሱዛን ኮንክሊን አክባሪ፡ አድቫንስድ ጥናት ኢንስቲትዩት፡ ፕሪንስተን ኒው ጀርሲ፤ ኢዮብ ደሪሎ ሶ.አ.ስ. ለንደን ዩኒቨርሲቲ፤ ጃሮድ ጃኮብስ፡ ዋርነር ፓሲፊክ ኮሌጅ፤ ክሪስቲን ኮን፡ ኖተርዳም ዩኒቨርሲቲ፤ ሚካኤል ጀርቨርስ፡ ቶሮንቶ ዩኒቨርሲቲ፤ ስቲቭ ደላማርተር፡ ጆርጅ ፎክስ ዩኒቨርሲቲ፤ አሌክሳንድራ ግለስፒ፡ ቶሮንቶ ዩኒቨርሲቲ፤ ዋልተር ሸሪር፡ ኖተርዳም ዩኒቨርሲቲ። መግለጫ ይህ ጥናት የሚገልፀው የግዕዝ ቋንቋ ፅሁፍን እና ሌሎች መሰል ትኩረት ያልተሰጣቸውን፣ ባህላዊና እና ጥንታዊ ሥሁፎችን ለመማር ወይም ለጥናት የሚፈልጉ ማህበረሰቦችን ፍላጎት ለማርካት የጥምር የጥናት ቡድናችን ስለቀረፀው ቀላል እና ሁሉም ሊጠቀምበት ስለሚችል መሣሪያ(ዘዴ) ነው።፡ይህ መሣሪያ የብራና ፅሁፍን የመሰሉ ረቂቅ ፅሁፎች የተፃፉባቸውን ገፆች ምሥል በማንሳት እና ፊደላትን ለይቶ በሚገነዘብ ጨረር (optical character recognition (OCR)) በመጠቀም ምሥሉን ወደ መደበኛ ወይም ሁለተኛ ፅሁፍነት የመቀየር ችሎታ ያለው ነው። ይህ ኮምፒዩተር ላይ የተመሰረተ ዘዴ ወይም መሣሪያ የግዕዝ ቋንቋን ልዩ ባህርዮች ለይቶ እንዲያውቅ ሲባል ስለቋንቋው ያገኘውን መረጃ ወይም ዳታ የመንከባከብ እና የማከም ሂደቶችን አልፎ እንደ አንጎል ነርቮች መረብ እሽክርክሪት የሚመስል ኮንቮሉሽናል ሪከረንት ነውራል ኔትዎርክ (Convolutional Recurrent Neural Network) በመያዙ ገጽታዎችን እና ምሥሎችን ወደ ፅሁፍ ይቀይራል። ይህ ለሁሉም ተጠቃሚዎች ክፍት የሆነው ጽሁፍ ለተማሪዎች እንዲሁም ለኢትዮጵያ ጽሁፍ ጥናት ተመራማሪዎች የሚጠቅም ብቃት ያለው እና በቀላሉ በኮምፒዩተር ተፈልጎ ሊገኝ የሚችል ከመሆኑም በተጨማሪ የግዕዝ ጽሁፎቹ የኢትዮጵያን እና የኢትዮጵያን ህዝብ ታሪክና ባህል ግዕዝን በዲጂታል/በኮምፑተር ቀርፆ በማስቀመጥ በቀጣይነት እንዲኖር ያስችላል። አመቺ የሆነ ተጨባጭ ሁኔታ ሲኖር ደግሞ ይህ ለሁሉም ክፍት የሆነ የ OCR የግዕዝን ምስልን ወደ ፅሁፍ የሚቀይር መሣሪያ ወይም ዘዴ ሌሎች ትኩረት ያላገኙ ረቂቅ ፅሁፎችንም እንዲያነብ ተደርጎ ሊሰለጥን ወይም ዲዛይን ሊደረግ ይችላል። ይህ የፈጠርነው መሣሪያ/ዘዴ የተለመደውን ግራፊክስ ፕሮሰሲንግ ዩኒት (GPU) የተባለውን በኮምፕዩተር ምሥሎችን የማንበቢያ እና ማሳለጫ ዘዴ መጠቀም አያስፈልገውም። በዚህም ምክንያት ከሌሎች ዘመናዊ የአርቲፊሻል ኢንተሊጀንስ (AI systems ) ዘዴዎች አንፃር ሲታይ ሃይለኛ የኮምፒዩተር አቅም አይፈልግም። ይህንን መሣሪያ/ዘዴ ያለ ኢንተርኔት ወይም በይነ-መረብ ከግል ኮምፒዩተር፣ በኢንተርኔት እንዲሁም ወደፊት ኢንተርኔት ባለው የእጅ ሥልክን በመጠቀም ማስኬድ ይቻላል። ይህ ጥናት የሚገልጸው በአይነቱ የመጀመሪያ የሆነው እና ለሁሉም ክፍት የሆነ እንዲሁም በተገቢ ሁኔታ ጥራቱን ጠብቆ በጥምር ተመራማሪዎቻችን የበለፀገው መሣሪያ/ዘዴ ለማናቸውም በግዕዝ መጽሀፍቶች እና ውስጣቸው በያዙት ፅሁፎች ላይ ጥናት ለማድረግ ለሚፈልጉ ግለሰቦችም ሆኑ ማህበረሰቦች ሁሉ ጠቃሚ መሆኑን ለማስገንዘብ ነው።

[en] Reconstructing historical texts from fragmentary sources: Charles S. Parnell and the Irish crisis, 1880-86

Eugenio Biagini, University of Cambridge; Patrick Geoghegan, Trinity College Dublin; Hugh Hanley, University of Cambridge; Aneirin Jones, University of Cambridge; Huw Jones, University of Cambridge

Abstract [en]

[en] Discourse cohesion in Xenophon’s On Horsemanship through Sketch Engine

Victoria Beatrix Fendel, University of Oxford; Matthew T.Ireland, Sidney Sussex College, University of Cambridge

Abstract [en]

[en] History Harvesting: A Case Study in Documenting Local History

Kimberly Woodring, Department of History, East Tennessee State University; Julie Fox-Horton, Cross-Disciplinary Studies, East Tennessee State University

Abstract [en]

[en] Cluster Analysis in Tracing Textual Dependencies – a Case of Psalm 6 in 16th-century English Devotional Manuals

Jerzy Wójcik, The John Paul II Catholic University of Lublin

Abstract [en]

[en] Project Quintessence: Examining Textual Dimensionality with a Dynamic Corpus Explorer

Samuel Pizelo, UC Davis; Arthur Koehl, UC Davis; Chandni Nagda, University of Illinois at Urbana-Champaign; Carl Stahmer, UC Davis

Abstract [en]

[en] The Digital Environmental Humanities (DEH) in the Anthropocene: Challenges and Opportunities in an Era of Ecological Precarity

John Ryan, Southern Cross University; Lydia Hearn, Edith Cowan University; Paul Longley Arthur, Edith Cowan University

Abstract [en]

[en] DH as Data: Establishing Greater Access through Sustainability

Alex Kinnaman, Virginia Tech; Corinne Guimont, Virginia Tech

Abstract [en]

[en] Visualizing a Series: Aggregate Compositional Analysis of Botticelli's Commedia

Nathaniel Corley, Amherst College

Abstract [en]

[en] Starting and Sustaining Digital Humanities/Digital Scholarships Centers: Lessons from the Trenches

Lynne Siemens, University of Victoria

Abstract [en]

Author Biographies

URL: http://www.digitalhumanities.org/dhq/vol/17/3/index.html
Comments:
Published by: and
Affiliated with: Digital Scholarship in the Humanities
DHQ has been made possible in part by the National Endowment for the Humanities.
Copyright © 2005 -

Unless otherwise noted, the DHQ web site and all DHQ published content are published under a Creative Commons Attribution-NoDerivatives 4.0 International License. Individual articles may carry a more permissive license, as described in the footer for the individual article, and in the article’s metadata.

Announcements

2023 17.3

Categories in Digital Humanities

Editors: Dominik Gerstorfer, Evelyn Gius, and Janina Jacke

Front Matter

Articles

Articles

Author Biographies