Articles
[en] Interlinking Text and Data with Semantic Annotation and Ontology Design Patterns to Analyse Historical TraveloguesSandra Balck, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg; Ingo Frank, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg; Hermann Beyer-Thoma, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg; Anna Ananieva, Leibniz Institute for East and Southeast European Studies (IOS) Regensburg
Abstract
[en]
This paper presents a first draft of the ongoing work on the edition of Franz Xaver Bronner’s travelogues, in which we apply a semantic model that combines widely accepted standards (such as CIDOC CRM, Dublin Core, SKOS etc.) with project-specific elements and TEI. We address the question of category development for time, space and events through text encoding and semantic annotation, and their assets for research while focusing on the current DFG-funded project “Digital Edition of Historical Travelogues” (DEHisRe).
[en] Category Development at the Interface of Interpretive Pragmalinguistic Annotation and Machine Learning: Annotation, detection and classification of linguistic routines of discourse referencing in political debatesMichael Bender, Technical University of Darmstadt; Maria Becker, University of Heidelberg; Carina Kiemes, Technical University of Darmstadt; Marcus Müller, Technical University of Darmstadt
Abstract
[en]
In this paper, we present a case study on quality criteria for the robustness of categories in pragmalinguistic tagset development. We model a number of classification tasks for linguistic routines of discourse referencing in the plenary minutes of the German Bundestag. In the process, we focus and reflect on three fundamental quality criteria: 1. segmentation, i.e. size of the annotated segments (e.g. words, phrases or sentences), 2. granularity, i.e. degrees of content differentiation and 3. interpretation depth, i.e. the degree of inclusion of linguistic knowledge, co-textual knowledge and extra-linguistic, context-sensitive knowledge. With the machine learnability of categories in mind, our focus is on principles and conditions of category development in collaborative annotation. Our experiments and tests on pilot corpora aim to investigate to which extent statistical measures indicate whether interpretative classifications are machine-reproducible and reliable. To this end, we compare gold-standard datasets annotated with different segment sizes (phrases, sentences) and categories with different granularity, respectively. We conduct experiments with different machine learning frameworks to automatically predict labels from our tagset. We apply BERT (), a pre-trained neural transformer language model which we finetune and constrain for our labelling and classification tasks, and compare it against Naive Bayes as a probabilistic knowledge-agnostic baseline model. The results from these experiments contribute to the development and reflection of our category systems.
[en] Making the Whole
Greater than the Sum of its Parts: Taxonomy development as a site of negotiation and
compromise in an interdisciplinary software development projectJennifer C. Edmond, Trinity College Dublin; Alejandro Benito Santos, University of Salamanca; Michelle Doran, Digital Repository of Ireland; Roberto Therón, University of Salamanca; Michał Kozak, Backbase; Cezary Mazurek, Institute of Bioorganic Chemistry of the Polish Academy of Sciences; Eveline Wandl-Vogt, Austrian Academy of Sciences; Aleyda Rocha Sepulveda, Austrian Academy of Sciences
Abstract
[en]
This paper describes the experience of a group of interdisciplinary researchers and
research professionals involved in the PROgressive VIsual DEcision-making in Digital
Humanities (PROVIDEDH) project, a four-year project funded within the CHIST-ERA call
2016 for the topic “Visual Analytics for Decision Making under
Uncertainty — VADMU”. It contributes to the academic literature on how
digital methods can enhance interdisciplinary cooperative work by exploring the
collaboration involved in developing visualisations to lead decision-making in
historical research in a specific interdisciplinary research setting. More
specifically, we discuss how the cross-disciplinary design of a taxonomy of sources
of uncertainty in Digital Humanities (DH), a “profoundly
collaborative enterprise” built at the intersection of computer science and
humanities research, became not just an instrument to organise data, but also a tool
to negotiate and build compromises between different communities of practice.
[en] Categorising Legal
Records – Deductive, Pragmatic, and Computational StrategiesMarlene Ernst, University of Passau; Sebastian Gassner, University of Passau; Markus Gerstmeier, University of Passau; Malte Rehbein, University of Passau
Abstract
[en]
Reprocessing printed source material and facilitating large-scale qualitative as well
as quantitative analyses with digital methods poses many challenges. A case study on
approximately 10,000 inventory entries for legal cases from the Special Court Munich
(1933–1945) highlights those and offers a glimpse into a digitisation workflow that
allows for in-depth computer-aided analysis. For this paper, different methods and
procedures for developing categorisation systems for legal charges are discussed.
[en] Made to Be a Woman: A case study on the categorization of gender using an individuation-based approach in the analysis of literary textsMareike Schumacher, University of Regensburg; Marie Flüh, Hamburg University
Abstract
[en]
In this article we analyze Simone de Beauvoir’s The Second Sex in order to come up with a category system including gender categories that do not only represent a binary notion but also include non-binary forms of gender. By manually annotating and then visualizing features of gender roles as a networked graph and analyzing the resulting “gender sphere” we come up with two category systems, one containing five, the other one four gender categories, each of which contain a number of gender roles. We compare the gender sphere Beauvoir creates in her text with one resulting from the analysis of narrative texts that she used as references for her analysis of gender roles. In this fiction corpus, we annotated character features, clothes and gender roles, evaluated them quantitatively and then matched them with the gender sphere. Thus it becomes visible that the literary corpus creates a scalar notion of gender ranging from a female to a male pole with a neutral center. Roles relevant for and characters showing traits of gender-diversity are set at the margins. Unlike this scalar notion Beauvoir works towards a more spherical understanding of gender where roles relevant for gender-diversity are well integrated into the center of the gender sphere. Although Beauvoir’s gender sphere cannot be mapped onto the graph data of the fiction corpus, it does offer a good starting point for macro- and microanalysis of literary characters, the features, clothes and roles that are used to describe them and the resulting gender specific profiling.
[en] Categorial Relations in (Re)constructing Topoi and in (Re)modeling Topology as a Methodology: Vertical, horizontal, heuristic and epistemological interdependenciesMaria Hinzmann, University of Trier
Abstract
[en]
The article reflects on the process of category development and category application that was relevant for the analysis of topoi in a specific corpus of travelogues. In this process, the investigation of the corpus and the (re)modeling of topology as a methodology were closely intertwined. This includes the interconnected modeling of the category topos and the (re)construction of concrete topos concepts. The paper proposes a methodological framework that represents categorial relations understood as relations between different types of categories, different concepts as instantiations of the same category as well as relations between categories and different heuristic levels. In addition to the definition and spatial representation of these relations, the iterative dimension of research processes in the digital humanities is discussed with regard to categorial interdependencies. Against the background of this (meta)model and the discussion of concrete examples from the study of travelogues, perspectives on possible follow-up differentiations and conceivable future work are touched upon.
[en] Systems of Intertextuality: Towards a formalization of text relations for manual annotation and automated reasoningJan Horstmann, University of Münster; Christian Lück, University of Münster; Immanuel Normann, University of Münster
Abstract
[en]
How can intertextual relations be formalized and annotated? What would be a coherent category system of intertextuality and which formalization is suitable to make it computable while not losing its expressiveness. Against the backdrop of the most influential classical theories of intertextuality, the article does not aim for an automatic detection of intertextual relations like many other digital humanities approaches did before, but suggests a formal and expandable model of the core of intertextuality with the means of description logic, i.e. it models relations and types of entities being related by them in a machine readable RDF format. The utilization of this theory-driven model is demonstrated by several examples of intertextual relations as discussed in literary studies.
[en] Are Ontologies Trees or Lattices?C. M. Sperberg-McQueen, Black Mesa Technologies LLC; Claus Huitfeldt, University of Bergen
Abstract
[en]
Ontologies, it is sometimes said, take the form of a hierarchy or tree: each class is subdivided into distinct subclasses with no cross classifications. But if the purpose of an ontology is to make possible useful inferences and to guide software users and developers, it is better to allow a more flexible structure. Using text annotation as an example (with concrete reference to the CATMA annotation tool), we argue that it will be more useful to structure ontologies as lattices, not trees.
[en] Visualization of Categorization: How to see
the wood and the treesOphir Münz-Manor, The Open University of Israel; Itay Marienberg-Milikowsky, Ben-Gurion University of the Negev
Abstract
[en]
In the article, we present, theorize and contextualize an investigation of
figurative language in a corpus of Hebrew liturgical poetry from late
antiquity, from both a manual and a computational point of view. The study
touches upon questions of distribution and patterns of usage of figures of
speech as well as their literary-historical meanings. Focusing on figures
of speech such as metaphors and similes, the corpus was first annotated
manually with markers on papers, and a few years later it was annotated
manually again, this time in a computer-assisted way, following a strictly
categorized approach, using CATMA (an online literary annotation tool). The
data was then transferred into ViS-À-ViS (an online visualization tool,
developed by Münz-Manor and his team) that enables scholars to “see the
wood” via various visualizations that single out, inter alia,
repetitive patterns either at the level of the text or the annotations. The
tool also enables one to visualize aggregated results concerning more than
one text, allowing one to “zoom out” and see the “forest aspect”
of the entire corpus or parts thereof. Interestingly, after visualizing the
material in this way, it often turns out that the categories themselves
need to be re-assessed. In other words, the categorization and
visualization in themselves create a sort of hermeneutical circle in which
both parts influence one another reciprocally.
Through the case study, we seek to demonstrate that, by using correct methods
and tools (not only ViS-À-ViS but others also), one can ultimately use
visualization of categorization as the basis for what might be called
established speculation, or not-trivial
generalization, which means, an interpretative act that tries
to be based on clear findings, while at the same time enjoying the
advantages of “over interpretation”. This approach, we argue, enables
one to see the trees without losing sight of the wood, and vice versa; or
“to give definition”
– at least tentatively – “to the microcosms and
macrocosms which describe the world around us”, be they factual or fictional.
[en] Annotating German in Austria: A Case-study of manual annotation in and for digital variationist linguisticsMarkus Pluschkovits, University of Vienna
Abstract
[en]
The following is a case study of manual annotation as used in a large-scale variationist linguist project focusing of spoken varieties of German in Austria. It showcases the technical architecture used for annotation, which unifies different granularities of tokenization through the corpus in a distinct entity designed for annotation. It furthermore lays out the hierarchical stand-off annotation used in the project Deutsch in Österreich (“German in Austria”) and demonstrates how a semantic model for the hierarchical organization of annotations can bring transparency to the annotation process while also providing a sounder epistemological basis than monodimensional annotation.
[en] From Semi-structured Text to Tangible Categories:
Analysing and annotating death lists in 18th century newspaper issuesClaudia Resch, Austrian Academy of Sciences; Nina C. Rastinger, Austrian Academy of Sciences; Thomas Kirchmair, Austrian Academy of Sciences
Abstract
[en]
Annotating – understood here as the process in which segments of a text are marked as belonging to a defined
category – can be seen as a key technique in many disciplines , especially for working with text in the Humanities [e.g. Unsworth 2000], the
Computational Sciences (e.g. ; ), and the
Digital Humanities . In the field of Digital Humanities, annotations of text are
utilized, among other purposes, for the enrichment of a corpus or digital edition with (linguistic) information
(e.g. ; ), for close and distant reading methods
(e.g. ), or for machine learning techniques (e.g. ). Defining categories to shape data has been used in different text analysis contexts, including the study of
toponyms (e.g. ) and biographical data (e.g. ).
The paper at hand showcases the use of annotations within the Vienna Time Machine project (2020-2022, PI: Claudia
Resch) which aims to connect different knowledge resources about historical Vienna via Named Entity Recognition
(NER). More specifically, it discusses the challenges and potentials of annotating 18th century death lists found
in the Wien[n]erisches Diarium or Wiener Zeitung, an early modern newspaper which was first published in 1703 and has already been (partly)
digitized in form of the so-called DIGITARIUM : Here, users can access over
330 high-quality full text issues of the newspaper which contain a number of different text types, including
articles, advertisements and more structured texts, such as arrival or death lists. The focus of this article lies
on the semi-structured death lists, which do not only appear in almost every issue of the historical Wiener Zeitung, but are also relatively consistent in their structure and display a high
semantic density: Each entry contains detailed information about a deceased person, such as their name, occupation,
place of death, and age.
Annotating these semi-structured list items opens up multiple possibilities: The resulting classified data can be
used for efficient distant or scalable reading, quantitative analyses ,
and as a gold standard for both rule-based and machine learning NER approaches (e.g. ). To reach this goal and as a first step of the annotation process, the project
team conducted a close reading of various death lists from multiple decades to identify recurrent linguistic
patterns and, based hereon, to develop a first expandable set of categories. This bottom-up approach resulted in
five preliminary categories, namely person, occupation, place, cause-of-death, and age, which were color-coded and, accompanied by annotated examples, documented in the form of annotation
guidelines as intersubjectively applicable and concise as possible. These guidelines were then used by two
researchers familiar with the historic material to annotate a randomly drawn and temporally distributed sample of
500 death list entries in the browser-based environment Prodigy (https://prodi.gy). Hereby, the emphasis was put especially on emerging “challenging” cases, i.e. items
where annotators were in doubt about their choice of category, the exact positioning of annotations or the
necessity to annotate certain text segments at all. Whenever annotators encountered such ambiguous items, these
were collected, grouped and – as a third step in the annotation process – discussed with an interdisciplinary group
of linguists, historians and prosopographers. Within this collective, a solution for each group of issues was
agreed on and incorporated into the annotation guidelines. Also, existing categories were revised where necessary.
The new, more stable category system was then again used for a new sequence of annotation and discussion of
ambiguities, resulting in an iterative process where annotation and category development became intertwined. This
approach, explained in the article in more detail, demonstrates that tagsets are never entirely final, but always
depend on particular knowledge interests and data material and that even the annotation of inherently
semi-structured lists requires continuous critical reflection and considerable historical and linguistic
knowledge.
At the same time, it can be exemplified by this work that it is precisely these “challenging” cases which
carry a great potential for gaining knowledge and can be considered central to the development of a valid
annotation system (cf. ).
[en] Developing Computational Models for Formalizing Concepts in the British Colonial India CorpusShanmugapriya T, University of Toronto Scarborough
Abstract
[en]
The concepts embedded in humanities materials are unstructured and possess multifaceted attributes. New insights from these materials are derived through systematic qualitative study. However, for the purpose of quantitative analysis using digital humanities methods and tools, formalizing these concepts becomes imperative. The functionality of digital humanities relies on the deployment of formalized concepts and models. Formalization converts unstructured data into a more structured form. Concurrently, models function as representations created to closely examine the modeled subject, while metamodels define the structure and properties of these models. In this case, the absence of formalized concepts and models for studying the British India colonial corpus hampers the computational application to address humanities research questions quantitatively. The texts are intricate, and the format is non-standard, as colonial officials documented extensive information to govern and control the colonized people and land. In this scenario, the British India colonial corpus cannot be effectively utilized for topic-specific research questions employing advanced text mining without formalizing the concepts within it. This article addresses the questions of what the most effective approach is for identifying multifaceted concepts within the non-standard British colonial India corpus through models and how these concepts can be formalized using formal models. It also explores how metamodels can be developed based on this experiment for a similar corpus.
[en] Case Study: Annotating the ambiguous modality of "must" in Jane Austen’s EmmaAngelika Zirker, University of Tübingen; Michael Göggelmann, University of Tübingen / University of Cologne
Abstract
[en]
The case study is based on student annotations from a class on “Digital Methods in Literary Studies” taught in the English Studies / English Literatures and Cultures programme at Tübingen University. The annotation task consisted in tagging the ambiguous modality of must in Jane Austen’s novel Emma (1816). The article, in a first step, presents how the criteria for annotation task were developed on the basis of a close reading of the novel; these evolved into annotation guidelines which were then translated into tag sets for two annotation tools: CATMA and CorefAnnotator. The overall results of the annotation process are discussed, with a particular focus on the difficulties that emerged as well as (patterns of) mistakes and misconceptions across the groups and individual annotators. This approach will yield insights into challenges when annotating with a group in a teaching context as well as foreground conceptual difficulties when it comes to annotating complex phenomena in literary texts.
[en] Automated
Transcription of Gə'əz Manuscripts Using Deep LearningSamuel Grieggs, University of Notre Dame; Jessica Lockhart, University of Toronto; Alexandra Atiya, University of Toronto; Gelila Tilahun, University of Toronto; Suzanne Akbari, Institute for Advanced Study, Princeton, NJ; Eyob Derillo, SOAS, University of London; Jarod Jacobs, Warner Pacific College; Christine Kwon, University of Notre Dame; Michael Gervers, University of Toronto; Steve Delamarter, George Fox University; Alexandra Gillespie, University of Toronto; Walter Scheirer, University of Notre Dame
Abstract
[en][en]
This paper describes a collaborative project designed to meet the needs of
communities interested in Gə'əz language texts – and other under-resourced
manuscript traditions – by developing an easy-to-use open-source tool that
converts images of manuscript pages into a transcription using optical character
recognition (OCR). Our computational tool incorporates a custom data curation
process to address the language-specific facets of Gə'əz coupled with a
Convolutional Recurrent Neural Network to perform the transcription. An
open-source OCR transcription tool for digitized Gə'əz manuscripts can be used
by students and scholars of Ethiopian manuscripts to create a substantial and
computer-searchable corpus of transcribed and digitized Gə'əz texts, opening
access to vital resources for sustaining the history and living culture of
Ethiopia and its people. With suitable ground-truth, our open-source OCR
transcription tool can also be retrained to read other under-resourced scripts.
The tool we developed can be run without a graphics processing unit (GPU),
meaning that it requires much less computing power than most other modern AI
systems. It can be run offline from a personal computer, or accessed via a web
client and potentially in the web browser of a smartphone. The paper describes
our team’s collaborative development of this first open-source tool for Gə'əz
manuscript transcription that is both highly accurate and accessible to
communities interested in Gə'əz books and the texts they contain.
ጥልቅ እውቀትን ለረቂቅ ጽሁፎች ስለመጠቀም
ሳሙኤል ግሪግስ፡ ኖተርዳም ዩኒቨርሲቲ፤ ጀሲካ ሎክሀርት፡ቶሮንቶ ዩኒቨርሲቲ፤ አሌክሳንደራ አትያ፡ ቶሮንቶ ዩኒቨርሲቲ፤ ገሊላ
ጥላሁን፡ ቶሮንቶ ዩኒቨርሲቲ፤ ሱዛን ኮንክሊን አክባሪ፡ አድቫንስድ ጥናት ኢንስቲትዩት፡ ፕሪንስተን ኒው ጀርሲ፤ ኢዮብ ደሪሎ
ሶ.አ.ስ. ለንደን ዩኒቨርሲቲ፤ ጃሮድ ጃኮብስ፡ ዋርነር ፓሲፊክ ኮሌጅ፤ ክሪስቲን ኮን፡ ኖተርዳም ዩኒቨርሲቲ፤ ሚካኤል ጀርቨርስ፡
ቶሮንቶ ዩኒቨርሲቲ፤ ስቲቭ ደላማርተር፡ ጆርጅ ፎክስ ዩኒቨርሲቲ፤ አሌክሳንድራ ግለስፒ፡ ቶሮንቶ ዩኒቨርሲቲ፤ ዋልተር ሸሪር፡
ኖተርዳም ዩኒቨርሲቲ።
መግለጫ
ይህ ጥናት የሚገልፀው የግዕዝ ቋንቋ ፅሁፍን እና ሌሎች መሰል ትኩረት ያልተሰጣቸውን፣ ባህላዊና እና ጥንታዊ ሥሁፎችን ለመማር
ወይም ለጥናት የሚፈልጉ ማህበረሰቦችን ፍላጎት ለማርካት የጥምር የጥናት ቡድናችን ስለቀረፀው ቀላል እና ሁሉም ሊጠቀምበት
ስለሚችል መሣሪያ(ዘዴ) ነው።፡ይህ መሣሪያ የብራና ፅሁፍን የመሰሉ ረቂቅ ፅሁፎች የተፃፉባቸውን ገፆች ምሥል በማንሳት እና
ፊደላትን ለይቶ በሚገነዘብ ጨረር (optical character recognition (OCR)) በመጠቀም ምሥሉን ወደ መደበኛ
ወይም ሁለተኛ ፅሁፍነት የመቀየር ችሎታ ያለው ነው። ይህ ኮምፒዩተር ላይ የተመሰረተ ዘዴ ወይም መሣሪያ የግዕዝ ቋንቋን ልዩ
ባህርዮች ለይቶ እንዲያውቅ ሲባል ስለቋንቋው ያገኘውን መረጃ ወይም ዳታ የመንከባከብ እና የማከም ሂደቶችን አልፎ እንደ አንጎል
ነርቮች መረብ እሽክርክሪት የሚመስል ኮንቮሉሽናል ሪከረንት ነውራል ኔትዎርክ (Convolutional Recurrent Neural
Network) በመያዙ ገጽታዎችን እና ምሥሎችን ወደ ፅሁፍ ይቀይራል። ይህ ለሁሉም ተጠቃሚዎች ክፍት የሆነው ጽሁፍ ለተማሪዎች
እንዲሁም ለኢትዮጵያ ጽሁፍ ጥናት ተመራማሪዎች የሚጠቅም ብቃት ያለው እና በቀላሉ በኮምፒዩተር ተፈልጎ ሊገኝ የሚችል ከመሆኑም
በተጨማሪ የግዕዝ ጽሁፎቹ የኢትዮጵያን እና የኢትዮጵያን ህዝብ ታሪክና ባህል ግዕዝን በዲጂታል/በኮምፑተር ቀርፆ በማስቀመጥ
በቀጣይነት እንዲኖር ያስችላል። አመቺ የሆነ ተጨባጭ ሁኔታ ሲኖር ደግሞ ይህ ለሁሉም ክፍት የሆነ የ OCR የግዕዝን ምስልን ወደ
ፅሁፍ የሚቀይር መሣሪያ ወይም ዘዴ ሌሎች ትኩረት ያላገኙ ረቂቅ ፅሁፎችንም እንዲያነብ ተደርጎ ሊሰለጥን ወይም ዲዛይን ሊደረግ
ይችላል። ይህ የፈጠርነው መሣሪያ/ዘዴ የተለመደውን ግራፊክስ ፕሮሰሲንግ ዩኒት (GPU) የተባለውን በኮምፕዩተር ምሥሎችን
የማንበቢያ እና ማሳለጫ ዘዴ መጠቀም አያስፈልገውም። በዚህም ምክንያት ከሌሎች ዘመናዊ የአርቲፊሻል ኢንተሊጀንስ (AI
systems ) ዘዴዎች አንፃር ሲታይ ሃይለኛ የኮምፒዩተር አቅም አይፈልግም። ይህንን መሣሪያ/ዘዴ ያለ ኢንተርኔት ወይም
በይነ-መረብ ከግል ኮምፒዩተር፣ በኢንተርኔት እንዲሁም ወደፊት ኢንተርኔት ባለው የእጅ ሥልክን በመጠቀም ማስኬድ ይቻላል። ይህ
ጥናት የሚገልጸው በአይነቱ የመጀመሪያ የሆነው እና ለሁሉም ክፍት የሆነ እንዲሁም በተገቢ ሁኔታ ጥራቱን ጠብቆ በጥምር
ተመራማሪዎቻችን የበለፀገው መሣሪያ/ዘዴ ለማናቸውም በግዕዝ መጽሀፍቶች እና ውስጣቸው በያዙት ፅሁፎች ላይ ጥናት ለማድረግ
ለሚፈልጉ ግለሰቦችም ሆኑ ማህበረሰቦች ሁሉ ጠቃሚ መሆኑን ለማስገንዘብ ነው።
[en] Reconstructing historical texts from fragmentary sources: Charles S. Parnell and the Irish crisis, 1880-86Eugenio Biagini, University of Cambridge; Patrick Geoghegan, Trinity College Dublin; Hugh Hanley, University of Cambridge; Aneirin Jones, University of Cambridge; Huw Jones, University of Cambridge
Abstract
[en]Charles Stewart Parnell was one of the most controversial and effective leaders in the
United Kingdom in the second half of the nineteenth century. Almost single-handedly, he transformed
the proposal of Home Rule for Ireland from a languishing irrelevance to a mass-supported cause.
Though the historiography on Parnell is substantial, his speeches – the main primary sources for
accessing both his thinking and strategies – have never been collected or edited. One of the core
questions in working towards an edition of his speeches was whether it would be possible
to use automated methods on these fragmentary sources to reconstruct what Parnell actually said in them.
We were also interested in how the reports varied, and what that variation might tell us about
the practices and biases of the journalists who wrote them and the newspapers which published them.
This article discusses the use of two digital tools in our attempts to answer these
research questions: CollateX, which was designed by Digital Humanities practitioners for the comparison
of textual variants, and SBERT Sentence Transformers, which establishes levels of similarity between texts.
In this article we talk about how the application of digital methods to the corpus led us away
from the idea of producing definitive reconstructions of the speeches, and towards a deeper
understanding of the corpus and the journalistic practices which went into its creation.
[en] Discourse cohesion in
Xenophon’s On Horsemanship through Sketch EngineVictoria Beatrix Fendel, University of Oxford; Matthew T.Ireland, Sidney Sussex College, University of Cambridge
Abstract
[en]
We build a Sketch Engine corpus for Xenophon’s classical Greek scientific treatise
On Horsemanship. Sketch Engine is a web-based
corpus-analysis tool that allows the user to inspect the lexical makeup of a text
(cf. keyword lists), explore the surroundings of select items (cf. concordances) and
identify fixed expressions in a text (cf. n-grams). We make available our
corpus-preparation tool and our corpus configuration file for Sketch Engine. We use
the Sketch Engine corpus to detect discontinuous verbal multi-word expressions,
specifically support-verb constructions (e.g. to take a
decision). We examine how support-verb constructions – through their
structural and lexical properties – aid discourse coherence and cohesion throughout
Xenophon’s treatise. We furthermore examine how the recurring support-verb
constructions in the treatise reflect the scientific register of the text. The
article shows how an understudied category of lexico-syntactic device (support-verb
constructions) in classical Greek majorly aids discourse cohesion, structurally and
contextually speaking. It also shows how an understudied text in the form of a
technical treatise (On Horsemanship) majorly furthers
insight into scientific literacy of the classical period. Finally, by making
available our corpus-preparation tool and code, we hope to further collaboration and
adaptation and thus improvement of existing tools and counteract the multiplication
of tools.
[en] History Harvesting: A Case Study in Documenting Local HistoryKimberly Woodring, Department of History, East Tennessee State University; Julie Fox-Horton, Cross-Disciplinary Studies, East Tennessee State University
Abstract
[en]
As a case study for the practice and application of digital history in a mid-size
university history department, this paper analyzes two History Harvest events
undertaken in a split-level digital history course. By examining the results of
two local History Harvests, specifically through participation of the greater
community, outside the university, and the preservation and digitization of the
local historical items, we discuss the impact history harvests can have on a
community, as well as history students. The primary goal of both History
Harvests outlined in this paper was to work with the local community
surrounding the university to preserve pieces of local history. This article
provides guidelines for conducting a History Harvest including suggestions for
community outreach, local university involvement with the greater community, and
digitizing issues that might occur while conducting the Harvest.
[en] Cluster Analysis in Tracing Textual Dependencies – a Case of Psalm 6 in 16th-century English Devotional ManualsJerzy Wójcik, The John Paul II Catholic University of Lublin
Abstract
[en]
This article uses cluster analysis in order to track textual affinities and identify the sources of different versions of historical texts on the basis text of Psalm 6 found in the 16th-century English manuals of devotion. The article offers a brief overview of the manuals of prayer examined, describes methods of cluster analysis used within the present work, and shows how cluster analysis can enrich and guide traditional philological knowledge.
[en] Project Quintessence: Examining Textual
Dimensionality with a Dynamic Corpus
ExplorerSamuel Pizelo, UC Davis; Arthur Koehl, UC Davis; Chandni Nagda, University of Illinois at Urbana-Champaign; Carl Stahmer, UC Davis
Abstract
[en]
In this paper, we present a free and open-access web tool for exploring the
EEBO-TCP early modern English corpus. Our tool combines several unsupervised
computational techniques into a coherent exploratory framework that allows for
textual analysis at a variety of scales. Through this tool, we hope to integrate
close-reading and corpus-wide analysis with the wider scope that computational
analysis affords. This integration, we argue, allows for an augmentation of both
methods: contextualizing close reading practices within historically- and
regionally-specific word usage and semantics, on the one hand, and concretizing
thematic and statistical trends by locating them at the textual level, on the
other. We articulate a design principle of textual dimensionality
or approximating through visualization the abstract relationships between words
in any text. We argue that Project Quintessence
represents a method for researchers to navigate archives at a variety of scales
by helping to visualize the many latent dimensions present in texts.
[en] The Digital Environmental Humanities (DEH)
in the Anthropocene: Challenges and Opportunities in an Era of Ecological
PrecarityJohn Ryan, Southern Cross University; Lydia Hearn, Edith Cowan University; Paul Longley Arthur, Edith Cowan University
Abstract
[en]
Researchers in the complementary fields of the digital humanities and the
environmental humanities have begun to collaborate under the auspices of
the digital environmental humanities (DEH). The overarching
aim of this emerging field is to leverage digital technologies in
understanding and addressing the urgencies of the Anthropocene. Emphasizing
DEH’s focus on natural and cultural vitality, this article begins with a
historical overview of the field. Crafting an account of the field’s
emergence, we argue that the present momentum toward DEH exhibits four
broad thematic strains including perennial eco-archiving; Anthropocene
narratives of loss; citizen ecohumanities; and human-plant-environment
relations. Within each of the four areas, the article identifies how DEH
ideas have been implemented in significant projects that engage with,
envision, re-imagine, and devise communities for environmental action and
transformation. We conclude with suggestions for further bolstering DEH by
democratizing environmental knowledge through open, community-engaged
methods.
[en] DH as Data: Establishing Greater Access through
SustainabilityAlex Kinnaman, Virginia Tech; Corinne Guimont, Virginia Tech
Abstract
[en]
This paper presents methodology and findings from a multi-case study exploring the use of
preservation and sustainability measures to increase access to digital humanities (DH)
content. Specifically, we seek to develop a workflow to both prepare DH content for
preservation while enhancing the accessibility of the project. This work is based on the
idea of treating DH as traditional data by applying data curation and digital preservation
methods to DH content. Our outcomes are an evaluation of the process and output using
qualitative methods, publicly accessible and described project components on two Virginia
Tech projects, and a potential workflow that can be applied to future work. By breaking
down individual projects into their respective components of content, code, metadata, and
documentation and examining each component individually for access and preservation, we
can begin migrating our digital scholarship to a sustainable, portable, and accessible
existence.
[en] Visualizing a Series: Aggregate Compositional Analysis of Botticelli's CommediaNathaniel Corley, Amherst College
Abstract
[en]
Applying digital methods as inputs to an interpretive process, I expose compositional motifs within Sandro Botticelli's momentous
Divina Commedia codex that depart from canonical manuscript illustrations. I then situate these
visual findings within Quattrocento literary and artistic theory, arguing that Botticelli manipulated his compositional structures
to harmonize with the humanist Cristoforo Landino's interpretation of the Commedia as an allegory for the
soul's ascension from “disorder” to “order”. By leafing through the pages of Botticelli's manuscript and perceiving
the striking structure and style of the illustrations, the observer could experience the incremental progress of Dante
the Pilgrim’s soul — and perhaps the viewer’s own — through the different stages of hell to paradise. Ultimately, I reflect on
the implications of digital methodologies within art history, and how these techniques may enrich or even challenge traditional
modes of “seeing” works of art.
[en] Starting and Sustaining Digital Humanities/Digital
Scholarships Centers: Lessons from the
TrenchesLynne Siemens, University of Victoria
Abstract
[en]
Along with the growth in Digital Humanities (DH) and Digital Scholarship (DS) as
digital methods, resources, and tools for research, teaching and dissemination,
interest in starting DH/DS centers as a means to support and sustain researchers and
projects is fast increasing. For those leading these initiatives, it raises
questions about the ways to engage possible stakeholders to develop support for a
centre, apply existing models, secure funding sources, and many others. This article
contributes to this discussion by examining the experiences of ten DH/DS centers in
North America and discerning smart practices for those wishing to start a similar
center. Often started by faculty or administrative champions, the interviewed
centers have a long history of operations. They offer a suite of activities and
services, ranging from consulting, training, access to technology, project support,
and others, with staff drawn from libraries, faculties, student ranks, and other
locations. These efforts support teachers, researchers, and students in their
efforts to undertake DH/DS projects. The centers are often funded through a
combination of base budgets and soft money and may be based in a library or faculty.
The paper concludes with implications for practice for those wishing to start their
own DH/DS center.